IGNITE Demos + Health History

Task: Look for basic correlations pertaining to smoking, alcohol usage, and general demographics using Health History Questionnaire (HHQ).

e.g., How does smoking history relate to the other measures of health history and demographics? What about alcohol intake?

Pending ISSUES:

30574 Check V02 height

Step 1. Import Baseline Demographics & Health History.

This data is reported via the COG1 Session. We could also use the NCI Dietary questionnaire, since this includes questions about alcohol usage. Although, we'll start at Health History Questionnaire (HHQ). Tips for finding other useful variables are integraded below.

I have already used the RedCap 'Data Export Reports & Stats' tab to download this data and place it on the server under the following: "/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/"

setwd("/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/")
FILE<-list.files("/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/", pattern="*.r")
PATH<-"/Volumes/IGNITE_Imaging/QC_Output/R_IGNITE/RedCap/PRE/Data/"
data<-paste(PATH, FILE, sep="")

source(data)

Step 2.

Assuming that the data has already been appropriately entered & compared within RedCap, we will remove the form entries ending in --1 and --2.

data$Compared<-str_detect(data$record_id, pattern = "--1")
data<-data[!(data$Compared==TRUE),]

data$Merged_RedCap<-str_detect(data$record_id, pattern = "--2")
data<-data[!(data$Merged_RedCap==TRUE),]

#data$record_id<-droplevels(data$record_id) #droplevels() to remove --* levels from 'record_id'

Step 3.

Next, we'll use plyr/dplyr/tidyverse's 'filter' command to filter-in only those rows (particpants) which have been randomized (i.e., randomization_complete.factor=="Complete").

data <- data %>% 
  filter(randomization_complete.factor=="Complete")

Step 4.

Get rid of columns where for ALL rows the value is NA. This should narrow our database to 500 particpants (rows).

data <- data[,colSums(is.na(data))<nrow(data)]

Step 6. BMI Group Calculation & Remove outlier/error'd entry (30574)

data$vo2_data_height_m<-data$vo2_data_height/100
data$BMI<-data$vo2_data_weight/(data$vo2_data_height_m*data$vo2_data_height_m)
data$BMI<-as.numeric(data$BMI)
data$BMI_group<-if_else(data$BMI<18.5, 1, as.numeric(data$BMI)) #under
data$BMI_group<-if_else( (data$BMI_group>1 & data$BMI_group<24.9)  ,2 , as.numeric(data$BMI_group)) #healthy
data$BMI_group<-if_else(data$BMI_group<29.9  & data$BMI_group>2 ,3, as.numeric(data$BMI_group)) #over
data$BMI_group<-if_else(data$BMI_group<34.9 & data$BMI_group>3 ,4, as.numeric(data$BMI_group)) #obese I
data$BMI_group<-if_else(data$BMI_group<39.9 & data$BMI_group>4 ,5, as.numeric(data$BMI_group)) #obese II
data$BMI_group<-if_else(data$BMI_group<49.9 & data$BMI_group>5 ,6, as.numeric(data$BMI_group)) #obese III
data$BMI_group<-if_else(data$BMI_group>50 ,7, as.numeric(data$BMI_group)) #OUTLIER?
data$BMI_group<-as.character(data$BMI_group)

data$BMI_GROUP<-if_else(data$BMI_group=="1", "Underweight", data$BMI_group)
data$BMI_GROUP<-if_else(data$BMI_group=="2", "Healthy", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="3", "Overweight", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="4", "Obese I", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="5", "Obese II", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="6", "Obese III", data$BMI_GROUP)
data$BMI_GROUP<-if_else(data$BMI_group=="7", "ERROR", data$BMI_GROUP)
data$BMI_GROUP<-factor(data$BMI_GROUP, levels=c("Underweight", "Healthy", "Overweight","Obese I", "Obese II", "Obese III"))

TMP<-data[!(data$BMI_GROUP=="ERROR"),] 
TMP<-TMP[complete.cases(TMP$BMI_GROUP),] 
TMP$BMI_GROUP<-as.factor(TMP$BMI_GROUP)

	BMI_GROUP
history_cig_history.factor	Underweight				Healthy				Overweight				Obese I				Obese II				Obese III				Total
Yes	0	(	0.0%	)	3	(	21.4%	)	6	(	42.9%	)	1	(	7.1%	)	1	(	7.1%	)	3	(	21.4%	)	14	(	100.0%	)
No	1	(	0.2%	)	95	(	19.1%	)	189	(	38.0%	)	124	(	24.9%	)	57	(	11.5%	)	31	(	6.2%	)	497	(	100.0%	)
Total	1	(	0.2%	)	98	(	19.2%	)	195	(	38.2%	)	125	(	24.5%	)	58	(	11.4%	)	34	(	6.7%	)	511	(	100.0%	)
Χ² = 6.8616 df = 5 p = .2311

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

	BMI_GROUP
screen_race.factor	Underweight				Healthy				Overweight				Obese I				Obese II				Obese III				Total
African American / Black	0	(	0.0%	)	8	(	11.8%	)	19	(	27.9%	)	19	(	27.9%	)	12	(	17.6%	)	10	(	14.7%	)	68	(	100.0%	)
Asian	0	(	0.0%	)	3	(	33.3%	)	4	(	44.4%	)	1	(	11.1%	)	1	(	11.1%	)	0	(	0.0%	)	9	(	100.0%	)
Caucasian / White	1	(	0.2%	)	82	(	19.5%	)	170	(	40.5%	)	101	(	24.0%	)	45	(	10.7%	)	21	(	5.0%	)	420	(	100.0%	)
American Indian or Alaska Native	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)
Native Hawaiian or other Pacific Islander	0	(	0.0%	)	1	(	100.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	1	(	100.0%	)
Subject Refused to Answer	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)
Bi-racial	0	(	0.0%	)	4	(	66.7%	)	1	(	16.7%	)	1	(	16.7%	)	0	(	0.0%	)	0	(	0.0%	)	6	(	100.0%	)
Other	0	(	0.0%	)	0	(	0.0%	)	1	(	14.3%	)	3	(	42.9%	)	0	(	0.0%	)	3	(	42.9%	)	7	(	100.0%	)
Total	1	(	0.2%	)	98	(	19.2%	)	195	(	38.2%	)	125	(	24.5%	)	58	(	11.4%	)	34	(	6.7%	)	511	(	100.0%	)
Χ² = NaN df = 35 p = NaN

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

Smoker Demographics health history -

WOW I forgot that nearly half of america used to smoke!! Crazy!

Data Frame Summary

prior_smoke_yrs.tbl

Dimensions: 513 x 8
Duplicates: 316

Variable

Stats / Values

Freqs (% of Valid)

Graph

screen_site.factor [factor]

1. Pitt 2. Kansas 3. Northeastern

186	(	36.3%	)
181	(	35.3%	)
146	(	28.5%	)

rand_age_group.factor [factor]

1. < =72 2. >72

402	(	78.4%	)
111	(	21.6%	)

rand_sex.factor [factor]

1. Male 2. Female

147	(	28.7%	)
366	(	71.3%	)

screen_race.factor [factor]

1. African American / Black 2. Asian 3. Caucasian / White 4. American Indian or Alaska 5. Native Hawaiian or other 6. Subject Refused to Answer 7. Bi-racial 8. Other

68	(	13.3%	)
9	(	1.8%	)
422	(	82.3%	)
0	(	0.0%	)
1	(	0.2%	)
0	(	0.0%	)
6	(	1.2%	)
7	(	1.4%	)

history_cig_history.factor [factor]

1. Yes 2. No

14	(	2.7%	)
499	(	97.3%	)

smoke_other.factor [factor]

1. Yes 2. No

15	(	2.9%	)
496	(	97.1%	)

prior_smoke.factor [factor]

1. Yes 2. No

219	(	43.9%	)
280	(	56.1%	)

prior_smoke_yrs [numeric]

Mean (sd) : 17 (12.4) min < med < max: 0.2 < 15 < 59 IQR (CV) : 17.8 (0.7)

40 distinct values

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

Descriptive Statistics

prior_smoke_yrs by screen_site.factor

Data Frame: prior_smoke_yrs.tbl
N: 186

	Mean	Std.Dev	Min	Median	Max	N.Valid	Pct.Valid
Pitt	20.19	12.61	2.00	20.00	50.00	92	49.46
Kansas	16.12	12.69	0.17	11.00	59.00	61	33.70
Northeastern	13.48	10.73	1.00	10.00	50.00	65	44.52

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

Data Frame Summary

Smokers.table

Dimensions: 14 x 6
Duplicates: 0

Variable

Label

Stats / Values

Freqs (% of Valid)

Graph

Missing

rand_treat.factor [factor]

1. Group 1: 150 mins of aero 2. Group 2: 225 mins of aero 3. Group 3: Stretch and Tone

6	(	42.9%	)
4	(	28.6%	)
4	(	28.6%	)

0 (0.0%)

rand_age_group.factor [factor]

1. < =72 2. >72

12	(	85.7%	)
2	(	14.3%	)

0 (0.0%)

rand_sex.factor [factor]

1. Male 2. Female

2	(	14.3%	)
12	(	85.7%	)

0 (0.0%)

smoke_day [labelled, numeric]

31A) How many packs do you smoke per day?

Mean (sd) : 0.5 (0.4) min < med < max: 0 < 0.4 < 1.5 IQR (CV) : 0.2 (0.9)

9 distinct values

0 (0.0%)

yrs_smoke [labelled, numeric]

31B) For how many years have you smoked?

Mean (sd) : 40.4 (13.2) min < med < max: 15 < 45 < 55 IQR (CV) : 20 (0.3)

15	:	1	(	7.1%	)
20	:	1	(	7.1%	)
30	:	3	(	21.4%	)
40	:	1	(	7.1%	)
41	:	1	(	7.1%	)
49	:	1	(	7.1%	)
50	:	3	(	21.4%	)
52	:	1	(	7.1%	)
54	:	1	(	7.1%	)
55	:	1	(	7.1%	)

0 (0.0%)

vo2_data_beta.factor [factor]

1. Yes 2. No

3	(	21.4%	)
11	(	78.6%	)

0 (0.0%)

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

Step X. Chi Squared - Smoking health history

data %$% 
  ctable(history_cig_history.factor, history_breath_sudden.factor,
         chisq = TRUE, OR = TRUE, RR=TRUE,
         headings = FALSE) %>%
  print(method = "render")

	history_breath_sudden.factor
history_cig_history.factor	Yes				No				Total
Yes	2	(	14.3%	)	12	(	85.7%	)	14	(	100.0%	)
No	12	(	2.4%	)	487	(	97.6%	)	499	(	100.0%	)
Total	14	(	2.7%	)	499	(	97.3%	)	513	(	100.0%	)
Χ² = 3.4572 df = 1 p = .0630 O.R. (95% C.I.) = 6.76 (1.36 - 33.59) R.R. (95% C.I.) = 5.94 (1.47 - 24.08)

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

data$vo2_data_test_speed<-as.factor(data$vo2_data_test_speed)
data %$% 
  ctable(history_cig_history.factor, vo2_data_test_speed,
   chisq = TRUE, OR = TRUE, RR=TRUE,
         headings = FALSE) %>%
  print(method = "render")

	vo2_data_test_speed
history_cig_history.factor	1.5				1.7				1.8				1.9				2				2.1				2.2				2.3				2.4				2.5				2.6				2.7				2.8				2.9				3				3.1				3.2				3.3				3.4				3.5				3.6				3.7				3.8				4				Total
Yes	1	(	7.1%	)	0	(	0.0%	)	1	(	7.1%	)	0	(	0.0%	)	3	(	21.4%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	1	(	7.1%	)	0	(	0.0%	)	6	(	42.9%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	2	(	14.3%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	0	(	0.0%	)	14	(	100.0%	)
No	9	(	1.8%	)	12	(	2.4%	)	5	(	1.0%	)	5	(	1.0%	)	22	(	4.4%	)	7	(	1.4%	)	20	(	4.0%	)	15	(	3.0%	)	16	(	3.2%	)	37	(	7.4%	)	13	(	2.6%	)	46	(	9.2%	)	43	(	8.6%	)	13	(	2.6%	)	66	(	13.2%	)	15	(	3.0%	)	32	(	6.4%	)	27	(	5.4%	)	10	(	2.0%	)	77	(	15.4%	)	3	(	0.6%	)	2	(	0.4%	)	3	(	0.6%	)	1	(	0.2%	)	499	(	100.0%	)
Total	10	(	1.9%	)	12	(	2.3%	)	6	(	1.2%	)	5	(	1.0%	)	25	(	4.9%	)	7	(	1.4%	)	20	(	3.9%	)	15	(	2.9%	)	16	(	3.1%	)	37	(	7.2%	)	13	(	2.5%	)	46	(	9.0%	)	44	(	8.6%	)	13	(	2.5%	)	72	(	14.0%	)	15	(	2.9%	)	32	(	6.2%	)	27	(	5.3%	)	12	(	2.3%	)	77	(	15.0%	)	3	(	0.6%	)	2	(	0.4%	)	3	(	0.6%	)	1	(	0.2%	)	513	(	100.0%	)
Χ² = 41.4629 df = 23 p = .0105

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

data %$% 
  ctable(history_cig_history.factor, vo2sum_criteria1.factor,
         chisq = TRUE, OR = TRUE, RR=TRUE,
         headings = FALSE) %>%
  print(method = "render")

	vo2sum_criteria1.factor
history_cig_history.factor	Yes				No				<NA>				Total
Yes	9	(	64.3%	)	5	(	35.7%	)	0	(	0.0%	)	14	(	100.0%	)
No	404	(	81.0%	)	94	(	18.8%	)	1	(	0.2%	)	499	(	100.0%	)
Total	413	(	80.5%	)	99	(	19.3%	)	1	(	0.2%	)	513	(	100.0%	)
Χ² = 1.5136 df = 1 p = .2186 O.R. (95% C.I.) = 0.42 (0.14 - 1.28) R.R. (95% C.I.) = 0.79 (0.54 - 1.17)

Generated by summarytools 0.9.9 (R version 3.6.1)
2021-05-18

...
...
...

There are a number of ways in which we can extract variables from a huge database. Since we know what we're looking for (Health History Questionnaire), we'll start with those columns. Using the RedCap codebook, we can see that all health history items start with a similar string of text "history_". We'll use tidyverse commands starts_with(), embedded within the select() call of the data.

For more information on Tidyverse: https://hbctraining.github.io/Intro-to-R/lessons/08_intro_tidyverse.html

We can additionally use some commands to make sure that any columns which include information about smoking/alcohol are selected. The contains() command will extract any columns which contain a matching string of text. Thus, we'll use "contains("smoke")" & "contains("alcohol")" within our select call was well.

The first 4 variables included in the select command below are needed in later steps. We've also included 'starts_with("vo2")' to get some other demographics (weight, height, etc.)

Let's specfically look at the HHQ. Since there is no definitive guide for scoring this questionnaire, we need some data reduction techneques. My first inclination tells me to use exploratory factor analysis. Irregardless, it is good to have a background in the following R commands/packages:

PCA (FactoMineR & factoextra packages) corrplot (corrplot package) factanal (stats package)

Best online expamples: http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/

HHQ<-data %>% select(  record_id, 
                       starts_with("history_"))
HHQ<-HHQ[,c(2:19)]
HHQ<-HHQ[(complete.cases(HHQ)),]


HHQ.pca<-PCA(HHQ, graph = FALSE)
fviz_eig(HHQ.pca, addlabels = TRUE) ## MAN thats a shit scree plot

var <- get_pca_var(HHQ.pca)

par(mfrow=c(1,2))
fviz_pca_var(HHQ.pca, col.var = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), 
             repel = TRUE )

corrplot(var$cos2, is.corr=FALSE)

par(mfrow=c(1,2))

fviz_contrib(HHQ.pca, choice = "var", axes = 1)

fviz_contrib(HHQ.pca, choice = "var", axes = 2)

IGNITE Demos + Health History

AG

Task: Look for basic correlations pertaining to smoking, alcohol usage, and general demographics using Health History Questionnaire (HHQ).

Pending ISSUES:

Step 1. Import Baseline Demographics & Health History.

Step 2.

Step 3.

Step 4.

Step 6. BMI Group Calculation & Remove outlier/error'd entry (30574)

Smoker Demographics health history -

Data Frame Summary

prior_smoke_yrs.tbl

Descriptive Statistics

prior_smoke_yrs by screen_site.factor

Data Frame Summary

Smokers.table

Step X. Chi Squared - Smoking health history