library(vtree)
library(Hmisc)
## Warning: package 'survival' was built under R version 3.5.3
packageVersion("vtree")
## [1] '1.1.1'
Data obtained from http://biostat.mc.vanderbilt.edu/DataSets
getHdata(diabetes)
These data are courtesy of Dr John Schorling, Department of Medicine, University of Virginia School of Medicine. The data consist of 19 variables on 403 subjects from 1046 subjects who were interviewed in a study to understand the prevalence of obesity, diabetes, and other cardiovascular risk factors in central Virginia for African Americans. According to Dr John Hong, Diabetes Mellitus Type II (adult onset diabetes) is associated most strongly with obesity. The waist/hip ratio may be a predictor in diabetes and heart disease. DM II is also agssociated with hypertension - they may both be part of “Syndrome X”. The 403 subjects were the ones who were actually screened for diabetes. Glycosolated hemoglobin > 7.0 is usually taken as a positive diagnosis of diabetes. For more information about this study see
Willems JP, Saunders JT, DE Hunt, JB Schorling: Prevalence of coronary heart disease risk factors among rural blacks: A community-based study. Southern Medical Journal 90:814-820; 1997
and
Schorling JB, Roach J, Siegel M, Baturka N, Hunt DE, Guterbock TM, Stewart HL: A trial of church-based smoking cessation interventions for rural African Americans. Preventive Medicine 26:92-101; 1997.
head(diabetes)
## id chol stab.glu hdl ratio glyhb location age gender height weight
## 1 1000 203 82 56 3.6 4.31 Buckingham 46 female 62 121
## 2 1001 165 97 24 6.9 4.44 Buckingham 29 female 64 218
## 3 1002 228 92 37 6.2 4.64 Buckingham 58 female 61 256
## 4 1003 78 93 12 6.5 4.63 Buckingham 67 male 67 119
## 5 1005 249 90 28 8.9 7.72 Buckingham 64 male 68 183
## 6 1008 248 94 69 3.6 4.81 Buckingham 34 male 71 190
## frame bp.1s bp.1d bp.2s bp.2d waist hip time.ppn
## 1 medium 118 59 NA NA 29 38 720
## 2 large 112 68 NA NA 46 48 360
## 3 large 190 92 185 92 49 57 180
## 4 large 110 50 NA NA 33 38 480
## 5 medium 138 80 NA NA 44 41 300
## 6 large 132 86 NA NA 36 42 195
The variables location
, gender
, and frame
are factors.
Running vtree
on a single variable is equivalent to a 1-way contingency table:
vtree(diabetes,"frame",horiz=FALSE,height=250,width=850)
Note that frame
has 12 missing values. “Valid” percentages are calculated after removing these missing values. Specifying vp=FALSE
lets you calculate percentages without removing the missing values.
Running vtree
on two variables is equivalent to a 2-way contingency table. Note that the variables can be listed in a single string,separated by spaces.
vtree(diabetes,"frame location",horiz=FALSE,height=250,width=850)
If we don’t need to see the variable names on the left-hand side, we can specify showlevels=FALSE
:
vtree(diabetes,"frame location",horiz=FALSE,height=250,width=850,showlevels=FALSE)
Now let’s use the summary
parameter to show some information about a continuous variable, glyhb
. Let’s specify summary="glyhb \nglyhb\nmean=%mean%\nSD=%SD%\nmv=%mv% %leafonly%"
. Here’s what it means:
glyhb
at the beginning means we want a summary of that variable.\n
is a line break.%mean%
is a code for the mean.%SD%
is a code for the standard deviation.%mv%
is a code for the number of missing values.%leafonly%
requests that the summary information be shown only in leaf nodes.vtree(diabetes,"frame location",horiz=FALSE,height=250,width=850,
showlevels=FALSE,summary="glyhb \n\nglyhb\nmean=%mean%\nSD=%SD%\nmv=%mv% %leafonly%")