Last Ran: 2021-05-18
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(summarytools)
library(magrittr)
library(corrplot)
library(FactoMineR)
library(factoextra)
e.g., How does smoking history relate to the other measures of health history and demographics? What about alcohol intake?
This data is reported via the COG1 Session. We could also use the NCI Dietary questionnaire, since this includes questions about alcohol usage. Although, we'll start at Health History Questionnaire (HHQ). Tips for finding other useful variables are integraded below.
I have already used the RedCap 'Data Export Reports & Stats' tab to download this data and place it on the server under the following: "/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/"
setwd("/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/")
FILE<-list.files("/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/", pattern="*.r")
PATH<-"/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/"
data<-paste(PATH, FILE, sep="")
source(data)
Assuming that the data has already been appropriately entered & compared within RedCap, we will remove the form entries ending in --1 and --2.
data$Compared<-str_detect(data$record_id, pattern = "--1")
data<-data[!(data$Compared==TRUE),]
data$Merged_RedCap<-str_detect(data$record_id, pattern = "--2")
data<-data[!(data$Merged_RedCap==TRUE),]
#data$record_id<-droplevels(data$record_id) #droplevels() to remove --* levels from 'record_id'
Next, we'll use plyr/dplyr/tidyverse's 'filter' command to filter-in only those rows (particpants) which have been randomized (i.e., randomization_complete.factor=="Complete").
data <- data %>%
filter(randomization_complete.factor=="Complete")
Get rid of columns where for ALL rows the value is NA. This should narrow our database to 500 particpants (rows).
data <- data[,colSums(is.na(data))<nrow(data)]
data$vo2_data_height_m<-data$vo2_data_height/100
data$BMI<-data$vo2_data_weight/(data$vo2_data_height_m*data$vo2_data_height_m)
data$BMI<-as.numeric(data$BMI)
data$BMI_group<-if_else(data$BMI<18.5, 1, as.numeric(data$BMI)) #under
data$BMI_group<-if_else( (data$BMI_group>1 & data$BMI_group<24.9) ,2 , as.numeric(data$BMI_group)) #healthy
data$BMI_group<-if_else(data$BMI_group<29.9 & data$BMI_group>2 ,3, as.numeric(data$BMI_group)) #over
data$BMI_group<-if_else(data$BMI_group<34.9 & data$BMI_group>3 ,4, as.numeric(data$BMI_group)) #obese I
data$BMI_group<-if_else(data$BMI_group<39.9 & data$BMI_group>4 ,5, as.numeric(data$BMI_group)) #obese II
data$BMI_group<-if_else(data$BMI_group<49.9 & data$BMI_group>5 ,6, as.numeric(data$BMI_group)) #obese III
data$BMI_group<-if_else(data$BMI_group>50 ,7, as.numeric(data$BMI_group)) #OUTLIER?
data$BMI_group<-as.character(data$BMI_group)
data$BMI_GROUP<-if_else(data$BMI_group=="1", "Underweight", data$BMI_group)
data$BMI_GROUP<-if_else(data$BMI_group=="2", "Healthy", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="3", "Overweight", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="4", "Obese I", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="5", "Obese II", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="6", "Obese III", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="7", "ERROR", data$BMI_GROUP)
data$BMI_GROUP<-factor(data$BMI_GROUP, levels=c("Underweight", "Healthy", "Overweight","Obese I", "Obese II", "Obese III"))
TMP<-data[!(data$BMI_GROUP=="ERROR"),]
TMP<-TMP[complete.cases(TMP$BMI_GROUP),]
TMP$BMI_GROUP<-as.factor(TMP$BMI_GROUP)
| BMI_GROUP | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| history_cig_history.factor | Underweight | Healthy | Overweight | Obese I | Obese II | Obese III | Total | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Yes | 0 | ( | 0.0% | ) | 3 | ( | 21.4% | ) | 6 | ( | 42.9% | ) | 1 | ( | 7.1% | ) | 1 | ( | 7.1% | ) | 3 | ( | 21.4% | ) | 14 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| No | 1 | ( | 0.2% | ) | 95 | ( | 19.1% | ) | 189 | ( | 38.0% | ) | 124 | ( | 24.9% | ) | 57 | ( | 11.5% | ) | 31 | ( | 6.2% | ) | 497 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Total | 1 | ( | 0.2% | ) | 98 | ( | 19.2% | ) | 195 | ( | 38.2% | ) | 125 | ( | 24.5% | ) | 58 | ( | 11.4% | ) | 34 | ( | 6.7% | ) | 511 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Χ2 = 6.8616 df = 5 p = .2311 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
| BMI_GROUP | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screen_race.factor | Underweight | Healthy | Overweight | Obese I | Obese II | Obese III | Total | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| African American / Black | 0 | ( | 0.0% | ) | 8 | ( | 11.8% | ) | 19 | ( | 27.9% | ) | 19 | ( | 27.9% | ) | 12 | ( | 17.6% | ) | 10 | ( | 14.7% | ) | 68 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Asian | 0 | ( | 0.0% | ) | 3 | ( | 33.3% | ) | 4 | ( | 44.4% | ) | 1 | ( | 11.1% | ) | 1 | ( | 11.1% | ) | 0 | ( | 0.0% | ) | 9 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Caucasian / White | 1 | ( | 0.2% | ) | 82 | ( | 19.5% | ) | 170 | ( | 40.5% | ) | 101 | ( | 24.0% | ) | 45 | ( | 10.7% | ) | 21 | ( | 5.0% | ) | 420 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| American Indian or Alaska Native | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Native Hawaiian or other Pacific Islander | 0 | ( | 0.0% | ) | 1 | ( | 100.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 1 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Subject Refused to Answer | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Bi-racial | 0 | ( | 0.0% | ) | 4 | ( | 66.7% | ) | 1 | ( | 16.7% | ) | 1 | ( | 16.7% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 6 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Other | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 1 | ( | 14.3% | ) | 3 | ( | 42.9% | ) | 0 | ( | 0.0% | ) | 3 | ( | 42.9% | ) | 7 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Total | 1 | ( | 0.2% | ) | 98 | ( | 19.2% | ) | 195 | ( | 38.2% | ) | 125 | ( | 24.5% | ) | 58 | ( | 11.4% | ) | 34 | ( | 6.7% | ) | 511 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Χ2 = NaN df = 35 p = NaN |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
WOW I forgot that nearly half of america used to smoke!! Crazy!
| Variable | Stats / Values | Freqs (% of Valid) | Graph | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screen_site.factor [factor] | 1. Pitt 2. Kansas 3. Northeastern |
|
|||||||||||||||||||||||||||||||||
| rand_age_group.factor [factor] | 1. < =72 2. >72 |
|
|||||||||||||||||||||||||||||||||
| rand_sex.factor [factor] | 1. Male 2. Female |
|
|||||||||||||||||||||||||||||||||
| screen_race.factor [factor] | 1. African American / Black 2. Asian 3. Caucasian / White 4. American Indian or Alaska 5. Native Hawaiian or other 6. Subject Refused to Answer 7. Bi-racial 8. Other |
|
|||||||||||||||||||||||||||||||||
| history_cig_history.factor [factor] | 1. Yes 2. No |
|
|||||||||||||||||||||||||||||||||
| smoke_other.factor [factor] | 1. Yes 2. No |
|
|||||||||||||||||||||||||||||||||
| prior_smoke.factor [factor] | 1. Yes 2. No |
|
|||||||||||||||||||||||||||||||||
| prior_smoke_yrs [numeric] | Mean (sd) : 17 (12.4) min < med < max: 0.2 < 15 < 59 IQR (CV) : 17.8 (0.7) | 40 distinct values |
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
| Mean | Std.Dev | Min | Median | Max | N.Valid | Pct.Valid | |
|---|---|---|---|---|---|---|---|
| Pitt | 20.19 | 12.61 | 2.00 | 20.00 | 50.00 | 92 | 49.46 |
| Kansas | 16.12 | 12.69 | 0.17 | 11.00 | 59.00 | 61 | 33.70 |
| Northeastern | 13.48 | 10.73 | 1.00 | 10.00 | 50.00 | 65 | 44.52 |
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
| Variable | Label | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rand_treat.factor [factor] | 1. Group 1: 150 mins of aero 2. Group 2: 225 mins of aero 3. Group 3: Stretch and Tone |
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| rand_age_group.factor [factor] | 1. < =72 2. >72 |
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| rand_sex.factor [factor] | 1. Male 2. Female |
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| smoke_day [labelled, numeric] | 31A) How many packs do you smoke per day? | Mean (sd) : 0.5 (0.4) min < med < max: 0 < 0.4 < 1.5 IQR (CV) : 0.2 (0.9) | 9 distinct values | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| yrs_smoke [labelled, numeric] | 31B) For how many years have you smoked? | Mean (sd) : 40.4 (13.2) min < med < max: 15 < 45 < 55 IQR (CV) : 20 (0.3) |
|
0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| vo2_data_beta.factor [factor] | 1. Yes 2. No |
|
0 (0.0%) |
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
data %$%
ctable(history_cig_history.factor, history_breath_sudden.factor,
chisq = TRUE, OR = TRUE, RR=TRUE,
headings = FALSE) %>%
print(method = "render")
| history_breath_sudden.factor | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| history_cig_history.factor | Yes | No | Total | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Yes | 2 | ( | 14.3% | ) | 12 | ( | 85.7% | ) | 14 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| No | 12 | ( | 2.4% | ) | 487 | ( | 97.6% | ) | 499 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Total | 14 | ( | 2.7% | ) | 499 | ( | 97.3% | ) | 513 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Χ2 = 3.4572 df = 1 p = .0630 O.R. (95% C.I.) = 6.76 (1.36 - 33.59) R.R. (95% C.I.) = 5.94 (1.47 - 24.08) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
data$vo2_data_test_speed<-as.factor(data$vo2_data_test_speed)
data %$%
ctable(history_cig_history.factor, vo2_data_test_speed,
chisq = TRUE, OR = TRUE, RR=TRUE,
headings = FALSE) %>%
print(method = "render")
| vo2_data_test_speed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| history_cig_history.factor | 1.5 | 1.7 | 1.8 | 1.9 | 2 | 2.1 | 2.2 | 2.3 | 2.4 | 2.5 | 2.6 | 2.7 | 2.8 | 2.9 | 3 | 3.1 | 3.2 | 3.3 | 3.4 | 3.5 | 3.6 | 3.7 | 3.8 | 4 | Total | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Yes | 1 | ( | 7.1% | ) | 0 | ( | 0.0% | ) | 1 | ( | 7.1% | ) | 0 | ( | 0.0% | ) | 3 | ( | 21.4% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 1 | ( | 7.1% | ) | 0 | ( | 0.0% | ) | 6 | ( | 42.9% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 2 | ( | 14.3% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 0 | ( | 0.0% | ) | 14 | ( | 100.0% | ) |
| No | 9 | ( | 1.8% | ) | 12 | ( | 2.4% | ) | 5 | ( | 1.0% | ) | 5 | ( | 1.0% | ) | 22 | ( | 4.4% | ) | 7 | ( | 1.4% | ) | 20 | ( | 4.0% | ) | 15 | ( | 3.0% | ) | 16 | ( | 3.2% | ) | 37 | ( | 7.4% | ) | 13 | ( | 2.6% | ) | 46 | ( | 9.2% | ) | 43 | ( | 8.6% | ) | 13 | ( | 2.6% | ) | 66 | ( | 13.2% | ) | 15 | ( | 3.0% | ) | 32 | ( | 6.4% | ) | 27 | ( | 5.4% | ) | 10 | ( | 2.0% | ) | 77 | ( | 15.4% | ) | 3 | ( | 0.6% | ) | 2 | ( | 0.4% | ) | 3 | ( | 0.6% | ) | 1 | ( | 0.2% | ) | 499 | ( | 100.0% | ) |
| Total | 10 | ( | 1.9% | ) | 12 | ( | 2.3% | ) | 6 | ( | 1.2% | ) | 5 | ( | 1.0% | ) | 25 | ( | 4.9% | ) | 7 | ( | 1.4% | ) | 20 | ( | 3.9% | ) | 15 | ( | 2.9% | ) | 16 | ( | 3.1% | ) | 37 | ( | 7.2% | ) | 13 | ( | 2.5% | ) | 46 | ( | 9.0% | ) | 44 | ( | 8.6% | ) | 13 | ( | 2.5% | ) | 72 | ( | 14.0% | ) | 15 | ( | 2.9% | ) | 32 | ( | 6.2% | ) | 27 | ( | 5.3% | ) | 12 | ( | 2.3% | ) | 77 | ( | 15.0% | ) | 3 | ( | 0.6% | ) | 2 | ( | 0.4% | ) | 3 | ( | 0.6% | ) | 1 | ( | 0.2% | ) | 513 | ( | 100.0% | ) |
| Χ2 = 41.4629 df = 23 p = .0105 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
data %$%
ctable(history_cig_history.factor, vo2sum_criteria1.factor,
chisq = TRUE, OR = TRUE, RR=TRUE,
headings = FALSE) %>%
print(method = "render")
| vo2sum_criteria1.factor | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| history_cig_history.factor | Yes | No | <NA> | Total | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Yes | 9 | ( | 64.3% | ) | 5 | ( | 35.7% | ) | 0 | ( | 0.0% | ) | 14 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| No | 404 | ( | 81.0% | ) | 94 | ( | 18.8% | ) | 1 | ( | 0.2% | ) | 499 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Total | 413 | ( | 80.5% | ) | 99 | ( | 19.3% | ) | 1 | ( | 0.2% | ) | 513 | ( | 100.0% | ) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Χ2 = 1.5136 df = 1 p = .2186 O.R. (95% C.I.) = 0.42 (0.14 - 1.28) R.R. (95% C.I.) = 0.79 (0.54 - 1.17) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18
...
...
...
There are a number of ways in which we can extract variables from a huge database. Since we know what we're looking for (Health History Questionnaire), we'll start with those columns. Using the RedCap codebook, we can see that all health history items start with a similar string of text "history_". We'll use tidyverse commands starts_with(), embedded within the select() call of the data.
For more information on Tidyverse: https://hbctraining.github.io/Intro-to-R/lessons/08_intro_tidyverse.html
We can additionally use some commands to make sure that any columns which include information about smoking/alcohol are selected. The contains() command will extract any columns which contain a matching string of text. Thus, we'll use "contains("smoke")" & "contains("alcohol")" within our select call was well.
The first 4 variables included in the select command below are needed in later steps. We've also included 'starts_with("vo2")' to get some other demographics (weight, height, etc.)
Let's specfically look at the HHQ. Since there is no definitive guide for scoring this questionnaire, we need some data reduction techneques. My first inclination tells me to use exploratory factor analysis. Irregardless, it is good to have a background in the following R commands/packages:
PCA (FactoMineR & factoextra packages) corrplot (corrplot package) factanal (stats package)
Best online expamples: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/
HHQ<-data %>% select( record_id,
starts_with("history_"))
HHQ<-HHQ[,c(2:19)]
HHQ<-HHQ[(complete.cases(HHQ)),]
HHQ.pca<-PCA(HHQ, graph = FALSE)
fviz_eig(HHQ.pca, addlabels = TRUE) ## MAN thats a shit scree plot
var <- get_pca_var(HHQ.pca)
par(mfrow=c(1,2))
fviz_pca_var(HHQ.pca, col.var = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE )
corrplot(var$cos2, is.corr=FALSE)
par(mfrow=c(1,2))
fviz_contrib(HHQ.pca, choice = "var", axes = 1)
fviz_contrib(HHQ.pca, choice = "var", axes = 2)