This document specifies a set of data from the Gubbio Population Study which we think would be desirable for initial research into the utility of the Nernst potential for potassium (\(E_k\) or E_k) as a biomarker for clinical use.
\(E_k\) is calculated using the Nernst equation: \(E_k = -V_t \cdot \ln \left( \frac{[K^+]_o}{[K^+]_i} \right)\)
The Gubbio Study is a prospective epidemiological study on the population residing in the city of Gubbio, Italy. Original objectives of the study were the control of hypertension and the role of cellular electrolyte handling in hypertension. Other objectives were added during the 30-year activity of the study. The study is ongoing.
Table 2 of (Cirillo et al. 2013) lists variables available for a total of 5376 individuals. In brief:
“Data categories included demographics, personal and family medical history, lifestyle habits, education, type of work, anthropometry, blood pressure, pulse rate, blood biochemistry, urine biochemistry and special investigations on cellular electrolyte handling.”
The Gubbio Study has generated a number of papers. These are the most relevant for our work. A more complete list appears in the references for (Cirillo et al. 2013).
Cohort profile: The Gubbio Population Study (Cirillo et al. 2013) gives a current (2013) overview of the study.
This 1995 paper discusses RBC K in the context of hypertension and was the motivation for investigating the study.
Red blood cell sodium and potassium concentration and blood pressure. The Gubbio Population Study (Trevisan et al. 1995)
We believe the following minimal dataset would be sufficient for the research described below.
For consistency with the 1995 paper using the same variable for hypertensive would be desirable. I’m not sure whether the best way to achieve that is to get that variable from the study or to get the variable for hypertension drug treatment and calculate from that.
Note that Table 3 also used BMI for age adjustment. Therefore BMI would be required for a complete replication of the Table 3 results.
Note that the Table 4 models used the following variables: age, body mass index, plasma total and high density-lipoprotein cholesterol, plasma uric acid, plasma glucose, alcohol intake, red cell mean corpuscular volume, hematocrit, urinary sodium/creatinine, cigarettes/d, pulse
We are doing data analysis with R and can read a variety of data formats. CSV may be the most convenient format to make data available for us.
(Note that there is some discussion about whether hematocrit is necessary for our initial work. If there is any problem with making the hematocrit data available it can be omitted.)
Following is the proposed research using the minimal dataset described above. The Gubbio Study contains many more variables which could be useful for other research ideas. Some possible examples are using \(E_k\) in survival analysis or studying the relationship of \(E_k\) to electrocardiogram variables like the QT interval.
The examples below are intended to illustrate some of the analyses we are hoping to do with the Gubbio data.
For data which is also available in NHANES (everything except RBC K) compare the population distributions of variables between studies.
Although reference ranges are available for both serum and RBC K, we are unaware of reference ranges for \(E_k\). Creation of these reference ranges and gaining an understanding of the population distribution of these values is an important initial step towards using \(E_k\) as a biomarker.
The basic idea is to extend Red blood cell sodium and potassium concentration and blood pressure. The Gubbio Population Study (Trevisan et al. 1995) to consider membrane potential as another potential explanatory variable.
Some prior work uses whole blood potassium (here called \(K_{wb}\)) as a biomarker. It would be helpful to determine reference ranges for \(K_{wb}\) and examine the correspondence with RBC and serum K.
This example code uses the NHANES III data combined with randomly generated values for K_RBC. It has the following goals:
The NHANES III data was chosen rather than the more recent continuous NHANES data because:
The NHANES III data required significant preprocessing to prepare it for this analysis, but here we just load the data file created earlier. The data has been limited to adults aged 20 or older.
Note that NHANES III has a complex survey design. We are not doing any survey weighting here though. We have other analysis code which does survey weighting, but concluded that it would add unneccesary complexity to these examples.
Summary information for the variables used in this analysis. Most variable names are self explanatory, but some are not:
variables <- c("Potassium", "Hematocrit", "SBP", "DBP", "sex", "Age", "HBPMeds", "HBPDrugClasses", "permth_exm", "mortstat")
summary(x[,variables])
## Potassium Hematocrit SBP DBP
## Min. :2.670 Min. :16.60 Min. : 80.0 Min. : 22.00
## 1st Qu.:3.880 1st Qu.:38.80 1st Qu.:111.0 1st Qu.: 67.00
## Median :4.080 Median :41.70 Median :122.0 Median : 74.00
## Mean :4.092 Mean :41.70 Mean :125.2 Mean : 74.12
## 3rd Qu.:4.290 3rd Qu.:44.70 3rd Qu.:136.0 3rd Qu.: 81.00
## Max. :6.200 Max. :57.35 Max. :237.0 Max. :142.00
## NA's :506 NA's :371 NA's :137 NA's :137
## sex Age HBPMeds HBPDrugClasses
## Male :3739 Min. :20.00 Mode :logical Mode :logical
## Female:4220 1st Qu.:32.00 FALSE:6655 FALSE:6317
## Median :45.00 TRUE :1297 TRUE :1642
## Mean :48.45 NA's :7
## 3rd Qu.:65.00
## Max. :90.00
##
## permth_exm mortstat
## Min. : 0.0 Min. :0.0000
## 1st Qu.:149.0 1st Qu.:0.0000
## Median :170.0 Median :0.0000
## Mean :157.7 Mean :0.2618
## 3rd Qu.:193.0 3rd Qu.:1.0000
## Max. :217.0 Max. :1.0000
## NA's :6 NA's :6
We use the criteria of taking high blood pressure medications OR SBP >= 140 OR DBP >= 90.
x <- transform(x,
Hypertensive = HBPMeds | (SBP >= 140) | (DBP >= 90))
x$HypertensiveF <- as.factor(x$Hypertensive)
# x$Hypertensive <- as.factor(x$HBPMeds | (x$SBP >= 140) | (x$DBP >= 90),
# levels=c("Not Hypertensive", "Hypertensive"))
summary(x$Hypertensive)
## Mode FALSE TRUE NA's
## logical 5453 2410 96
The only data we need which is not in NHANES III is RBC potassium. Generate placeholder data here to allow creating analysis code.
Two options were considered:
Option 2. was chosen because it permits readable scatterplots to be created. One should not draw any conclusions from the RBC K data presented here. We are using a distribution with a mean of 90 and a standard deviation of 8.
Also calculate the membrane potential \(E_k\) based on the generated data.
x$RBCK <- rnorm(n=nrow(x), mean=90, sd=8)
x$E_k <- -27 * log(x$RBCK / x$Potassium) # mV
summary(x[,c("RBCK", "E_k")])
## RBCK E_k
## Min. : 58.20 Min. :-97.22
## 1st Qu.: 84.51 1st Qu.:-85.62
## Median : 90.00 Median :-83.49
## Mean : 89.94 Mean :-83.42
## 3rd Qu.: 95.32 3rd Qu.:-81.22
## Max. :121.94 Max. :-70.97
## NA's :506
Here we load the analysis code used in these examples. It is not shown here but can be viewed at To Be Implemented
Examine the raw data. This is comparable to Compare Data with NHANES above.
This is comparable to Membrane Potential Reference Ranges above, but we use serum K because that is real data.
First look at a conditional density plot by sex.
cdp(x$sex, x$Potassium,
main = "Serum Potassium by Sex", xlab = "Serum K (mEq/L)",
lx = "topleft")
We see a difference. Try to assess the magnitude.
boxplot(Potassium ~ sex, data=x,
ylab="Serum K (mEq/L)", main = "Serum Potassium by Sex")
useQuantiles <- c(0.01, 0.05, 0.25, 0.5, 0.75, 0.95, 0.99)
qall <- quantile(x$Potassium, useQuantiles, na.rm=TRUE)
qmen <- quantile(x$Potassium[x$sex == "Male"], useQuantiles, na.rm=TRUE)
qwomen <- quantile(x$Potassium[x$sex == "Female"], useQuantiles, na.rm=TRUE)
quantiles <- round(data.frame(All = qall, Men = qmen, Women = qwomen), 2)
quantiles
## All Men Women
## 1% 3.26 3.29 3.22
## 5% 3.57 3.60 3.55
## 25% 3.88 3.92 3.85
## 50% 4.08 4.13 4.05
## 75% 4.29 4.35 4.24
## 95% 4.66 4.72 4.58
## 99% 4.99 5.03 4.91
Check for statistical significance of the difference in male and female mean values.
result <- t.test(Potassium ~ sex, x)
result
##
## Welch Two Sample t-test
##
## data: Potassium by sex
## t = 11.74, df = 7209.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.07641942 0.10705476
## sample estimates:
## mean in group Male mean in group Female
## 4.140480 4.048743
We see that the difference in means is statistically significant but is not that large at slightly less than 0.1.
Look at variation by age.
scatterplot(Potassium ~ Age, data=x,
xlab="Age", ylab="Serum K (mEq/L)",
main="Variation of Serum Potassium with Age")
Look at variation by age and sex.
scatterplot(Potassium ~ Age | sex, data=x,
legend.coords = "topright",
xlab="Age", ylab="Serum K (mEq/L)", ylim=c(3.0, 5.3),
main="Variation of Serum Potassium with Age and Sex")
We see men have higher serum potassium than women (on average) and both tend to increase slightly with age. Check for statistical significance using a linear regression model.
K.mod1 <- lm(Potassium ~ Age + sex, data=x)
summary(K.mod1)
##
## Call:
## lm(formula = Potassium ~ Age + sex, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.40223 -0.20149 -0.00515 0.20041 2.08064
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0538188 0.0112999 358.749 <2e-16 ***
## Age 0.0017715 0.0002001 8.851 <2e-16 ***
## sexFemale -0.0896454 0.0077484 -11.569 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3337 on 7450 degrees of freedom
## (506 observations deleted due to missingness)
## Multiple R-squared: 0.02851, Adjusted R-squared: 0.02825
## F-statistic: 109.3 on 2 and 7450 DF, p-value: < 2.2e-16
K.mod2 <- lm(Potassium ~ Age * sex, data=x)
summary(K.mod2)
##
## Call:
## lm(formula = Potassium ~ Age * sex, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.39493 -0.20028 -0.00519 0.20018 2.08805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0233898 0.0153505 262.101 < 2e-16 ***
## Age 0.0023935 0.0002919 8.201 2.78e-16 ***
## sexFemale -0.0329048 0.0208754 -1.576 0.11501
## Age:sexFemale -0.0011732 0.0004008 -2.927 0.00343 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3335 on 7449 degrees of freedom
## (506 observations deleted due to missingness)
## Multiple R-squared: 0.02963, Adjusted R-squared: 0.02924
## F-statistic: 75.81 on 3 and 7449 DF, p-value: < 2.2e-16
anova(K.mod1, K.mod2)
## Analysis of Variance Table
##
## Model 1: Potassium ~ Age + sex
## Model 2: Potassium ~ Age * sex
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 7450 829.53
## 2 7449 828.57 1 0.95293 8.5669 0.003434 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The first model shows us the serum potassium has a statistically significant association with both age and sex. But the second model shows us that the interaction of the variables is also significant so look at the effects of the variables in more detail.
effect.Age.sex <- effect("Age:sex", K.mod2)
#summary(effect.Age.sex)
plot(effect.Age.sex, ylab="Serum K (mEq/L)")
This plot shows how age and sex affect serum potassium. The vertical offset corresponds to the sex term, the slope (Male) corresponds to the age term, and the difference in slopes corresponds to the interaction term.
By offsetting age by its approximate mean we more clearly see the significance of all variables in the model and obtain a more meaningful estimate of the sex coefficient (compare to the T test for the difference in means above). In addition the intercept becomes more meaningful as the value for a 50 year old man.
K.mod3 <- lm(Potassium ~ I(Age-50) * sex, data=x)
summary(K.mod3)
##
## Call:
## lm(formula = Potassium ~ I(Age - 50) * sex, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.39493 -0.20028 -0.00519 0.20018 2.08805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1430645 0.0056471 733.668 < 2e-16 ***
## I(Age - 50) 0.0023935 0.0002919 8.201 2.78e-16 ***
## sexFemale -0.0915629 0.0077722 -11.781 < 2e-16 ***
## I(Age - 50):sexFemale -0.0011732 0.0004008 -2.927 0.00343 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3335 on 7449 degrees of freedom
## (506 observations deleted due to missingness)
## Multiple R-squared: 0.02963, Adjusted R-squared: 0.02924
## F-statistic: 75.81 on 3 and 7449 DF, p-value: < 2.2e-16
Examine association between serum K and hypertension. This is comparable to Membrane Potential and Hypertension above, but we use serum K because that is real data.
boxplot(Potassium ~ Hypertensive, data=x,
ylab="Serum K (mEq/L)", main = "Serum Potassium by Hypertension Status")
Notice that the medians are fairly similar, but the hypertensive distribution is more spread out.
scatterplot(Potassium ~ Age | Hypertensive, data=x,
legend.coords = "topright",
xlab="Age", ylab="Serum K (mEq/L)", ylim=c(3.0, 5.3),
main="Variation of Serum Potassium with Age and Hypertension Status")
Non-hypertensives appear to have slightly higher serum K on average. Hopefully membrane voltage shows a better correspondence.
The most interesting thing about this plot is the colored symbols show how much hypertension increases with age.
We can look at the association of serum K with hypertension in both directions. Given the significant association between serum K and both sex and age shown above we correct for both variables and their interaction in these models.
Add hypertension to our model from above.
K.mod2a <- lm(Potassium ~ I(Age-50)*sex,
data=x[!is.na(x$HypertensiveF),])
K.mod3 <- lm(Potassium ~ I(Age-50)*sex + HypertensiveF, data=x)
summary(K.mod3)
##
## Call:
## lm(formula = Potassium ~ I(Age - 50) * sex + HypertensiveF, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4193 -0.2022 -0.0086 0.1962 2.0659
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1817023 0.0064872 644.61 <2e-16 ***
## I(Age - 50) 0.0036584 0.0003125 11.71 <2e-16 ***
## sexFemale -0.0911553 0.0077383 -11.78 <2e-16 ***
## HypertensiveFTRUE -0.1206430 0.0098092 -12.30 <2e-16 ***
## I(Age - 50):sexFemale -0.0008333 0.0004026 -2.07 0.0385 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.33 on 7376 degrees of freedom
## (578 observations deleted due to missingness)
## Multiple R-squared: 0.04793, Adjusted R-squared: 0.04742
## F-statistic: 92.84 on 4 and 7376 DF, p-value: < 2.2e-16
anova(K.mod2a, K.mod3)
## Analysis of Variance Table
##
## Model 1: Potassium ~ I(Age - 50) * sex
## Model 2: Potassium ~ I(Age - 50) * sex + HypertensiveF
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 7377 819.94
## 2 7376 803.46 1 16.477 151.26 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that hypertensives have a serum K about 0.12 lower than normotensives and the effect is clearly significant.
Another way of looking at this is with a conditional density plot.
cdp(x$HypertensiveF, x$Potassium,
main = "Serum Potassium by Hypertension Status", xlab = "Serum K (mEq/L)",
lx = "topleft")
Here we see the same spreading out effect as observed in the Box plot.
The difference does not look that large until you think about the densities for K < 3.5 as ratios.
Since there are more hypertensives at both low and high serum K I think looking at the association in the other direction (below) may be more useful.
Create similar models (but now using logistic regression) for prediction of hypertension from serum K.
hyp.mod1 <- glm(HypertensiveF ~ I(Age-50)*sex,
data=x[!is.na(x$Potassium),], family=binomial)
hyp.mod2 <- glm(HypertensiveF ~ I(Age-50)*sex + Potassium,
data=x, family=binomial)
summary(hyp.mod2)
##
## Call:
## glm(formula = HypertensiveF ~ I(Age - 50) * sex + Potassium,
## family = binomial, data = x)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2542 -0.6570 -0.3939 0.7447 2.8023
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.46110 0.37190 9.306 < 2e-16 ***
## I(Age - 50) 0.06266 0.00245 25.577 < 2e-16 ***
## sexFemale -0.24197 0.06418 -3.770 0.000163 ***
## Potassium -1.05927 0.09003 -11.766 < 2e-16 ***
## I(Age - 50):sexFemale 0.02268 0.00362 6.264 3.75e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9082.2 on 7380 degrees of freedom
## Residual deviance: 6734.2 on 7376 degrees of freedom
## (578 observations deleted due to missingness)
## AIC: 6744.2
##
## Number of Fisher Scoring iterations: 5
#anova(hyp.mod2)
anova(hyp.mod1, hyp.mod2)
## Analysis of Deviance Table
##
## Model 1: HypertensiveF ~ I(Age - 50) * sex
## Model 2: HypertensiveF ~ I(Age - 50) * sex + Potassium
## Resid. Df Resid. Dev Df Deviance
## 1 7377 6879.6
## 2 7376 6734.2 1 145.36
Serum K is significant as a predictor of hypertension. Look at the effect in more detail. Plot the probability of hypertension at various levels of potassium.
effect.Potassium <- effect("Potassium", hyp.mod2)
plot(effect.Potassium, ylab="Prob(Hypertensive)")
This is a dramatic effect. I’m surprised we don’t see this more clearly in the scatterplot above (perhaps because of the sex adjustment?).
Check for nonlinearities in the serum K effect.
hyp.mod3 <- glm(HypertensiveF ~ I(Age-50)*sex + pspline(Potassium, df=3),
data=x, family=binomial)
#summary(hyp.mod3)
anova(hyp.mod3, test='Chisq')
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: HypertensiveF
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 7380 9082.2
## I(Age - 50) 1 2157.47 7379 6924.7 < 2.2e-16 ***
## sex 1 0.14 7378 6924.6 0.709
## pspline(Potassium, df = 3) 10 207.05 7368 6717.5 < 2.2e-16 ***
## I(Age - 50):sex 1 43.55 7367 6674.0 4.127e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(hyp.mod2, hyp.mod3, test='Chisq')
## Analysis of Deviance Table
##
## Model 1: HypertensiveF ~ I(Age - 50) * sex + Potassium
## Model 2: HypertensiveF ~ I(Age - 50) * sex + pspline(Potassium, df = 3)
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 7376 6734.2
## 2 7367 6674.0 9 60.233 1.209e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
effect.Potassium1 <- effect("Potassium", hyp.mod3)
## NOTE: Potassium does not appear in the model
plot(effect.Potassium1, ylab="Prob(Hypertensive)", ylim=c(-3,3))
The nonlinearity is clearly significant. Notice that it accounts for over 40% as much additional deviance as the original K linear term. Based on this it should clearly be included in the model, but perhaps not with so many degrees of freedom.
I’m not sure how to best interpret this. Perhaps as a fairly linear decline in probability up to serum K of about 4.4 and then fairly indeterminate behavior (notice how large the confidence intervals are).
I think this plot is extremely interesting and am looking forward to trying something similar with RBC K and \(E_k\)!
Given these results look at the relationship of SBP and DBP with serum K. This subsection and the next two are similar to Table 4 in (Trevisan et al. 1995) except with a more limited set of variables and using serum K instead of RBC K.
Create similar models (but now using linear regression) for prediction of SBP from serum K.
sbp.mod1 <- lm(SBP ~ I(Age-50)*sex, data=x[!is.na(x$Potassium),])
sbp.mod2 <- lm(SBP ~ I(Age-50)*sex + Potassium, data=x)
summary(sbp.mod2)
##
## Call:
## lm(formula = SBP ~ I(Age - 50) * sex + Potassium, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.681 -9.880 -1.489 7.839 104.584
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 144.94410 2.28163 63.527 < 2e-16 ***
## I(Age - 50) 0.47686 0.01391 34.291 < 2e-16 ***
## sexFemale -4.40482 0.36916 -11.932 < 2e-16 ***
## Potassium -3.93164 0.54715 -7.186 7.35e-13 ***
## I(Age - 50):sexFemale 0.24596 0.01905 12.908 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.56 on 7349 degrees of freedom
## (605 observations deleted due to missingness)
## Multiple R-squared: 0.3759, Adjusted R-squared: 0.3756
## F-statistic: 1107 on 4 and 7349 DF, p-value: < 2.2e-16
anova(sbp.mod2)
## Analysis of Variance Table
##
## Response: SBP
## Df Sum Sq Mean Sq F value Pr(>F)
## I(Age - 50) 1 979047 979047 4044.866 < 2.2e-16 ***
## sex 1 37864 37864 156.431 < 2.2e-16 ***
## Potassium 1 14112 14112 58.303 2.53e-14 ***
## I(Age - 50):sex 1 40331 40331 166.625 < 2.2e-16 ***
## Residuals 7349 1778802 242
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(sbp.mod1, sbp.mod2)
## Analysis of Variance Table
##
## Model 1: SBP ~ I(Age - 50) * sex
## Model 2: SBP ~ I(Age - 50) * sex + Potassium
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 7350 1791300
## 2 7349 1778802 1 12498 51.633 7.348e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Serum K is significant as a predictor of SBP with a coefficient of -3.9 per point of potassium. Look at the effect in more detail. Plot the predicted SBP at various levels of potassium.
effect.Potassium.sbp <- effect("Potassium", sbp.mod2)
plot(effect.Potassium.sbp, ylab="SBP")
This is a fairly dramatic effect.
Check for nonlinearities in the serum K effect.
sbp.mod3 <- glm(SBP ~ I(Age-50)*sex + pspline(Potassium, df=3), data=x)
#summary(sbp.mod3)
anova(sbp.mod3, test='Chisq')
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: SBP
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 7353 2850156
## I(Age - 50) 1 979047 7352 1871109 < 2.2e-16 ***
## sex 1 37864 7351 1833245 < 2.2e-16 ***
## pspline(Potassium, df = 3) 10 21716 7341 1811529 5.318e-15 ***
## I(Age - 50):sex 1 40779 7340 1770750 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(sbp.mod2, sbp.mod3, test='Chisq')
## Analysis of Variance Table
##
## Model 1: SBP ~ I(Age - 50) * sex + Potassium
## Model 2: SBP ~ I(Age - 50) * sex + pspline(Potassium, df = 3)
## Res.Df RSS Df Sum of Sq Pr(>Chi)
## 1 7349 1778802
## 2 7340 1770750 9 8051.9 0.0001148 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
effect.Potassium1.sbp <- effect("Potassium", sbp.mod3)
## NOTE: Potassium does not appear in the model
plot(effect.Potassium1.sbp, ylab="SBP", ylim=c(122,132))
We again see significant nonlinearity.
Create similar models (but now using linear regression) for prediction of DBP from serum K.
dbp.mod1 <- lm(DBP ~ I(Age-50)*sex, data=x[!is.na(x$Potassium),])
dbp.mod2 <- lm(DBP ~ I(Age-50)*sex + Potassium, data=x)
summary(dbp.mod2)
##
## Call:
## lm(formula = DBP ~ I(Age - 50) * sex + Potassium, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -53.234 -6.477 -0.361 6.011 68.431
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 91.323450 1.495072 61.083 < 2e-16 ***
## I(Age - 50) 0.032159 0.009112 3.529 0.000419 ***
## sexFemale -4.737393 0.241900 -19.584 < 2e-16 ***
## Potassium -3.557167 0.358531 -9.922 < 2e-16 ***
## I(Age - 50):sexFemale 0.053070 0.012486 4.251 2.16e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.19 on 7349 degrees of freedom
## (605 observations deleted due to missingness)
## Multiple R-squared: 0.07167, Adjusted R-squared: 0.07116
## F-statistic: 141.8 on 4 and 7349 DF, p-value: < 2.2e-16
anova(dbp.mod2)
## Analysis of Variance Table
##
## Response: DBP
## Df Sum Sq Mean Sq F value Pr(>F)
## I(Age - 50) 1 9055 9055 87.126 < 2.2e-16 ***
## sex 1 37482 37482 360.650 < 2.2e-16 ***
## Potassium 1 10548 10548 101.490 < 2.2e-16 ***
## I(Age - 50):sex 1 1878 1878 18.067 2.159e-05 ***
## Residuals 7349 763767 104
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(dbp.mod1, dbp.mod2)
## Analysis of Variance Table
##
## Model 1: DBP ~ I(Age - 50) * sex
## Model 2: DBP ~ I(Age - 50) * sex + Potassium
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 7350 773997
## 2 7349 763767 1 10230 98.436 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Serum K is significant as a predictor of DBP with a coefficient of -3.6 per point of potassium. Look at the effect in more detail. Plot the predicted DBP at various levels of potassium.
effect.Potassium.dbp <- effect("Potassium", dbp.mod2)
plot(effect.Potassium.dbp, ylab="DBP")
This is a fairly dramatic effect.
Check for nonlinearities in the serum K effect.
dbp.mod3 <- glm(DBP ~ I(Age-50)*sex + pspline(Potassium, df=3), data=x)
#summary(dbp.mod3)
anova(dbp.mod3, test='Chisq')
## Analysis of Deviance Table
##
## Model: gaussian, link: identity
##
## Response: DBP
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 7353 822729
## I(Age - 50) 1 9055 7352 813674 < 2.2e-16 ***
## sex 1 37482 7351 776192 < 2.2e-16 ***
## pspline(Potassium, df = 3) 10 12115 7341 764078 < 2.2e-16 ***
## I(Age - 50):sex 1 1885 7340 762193 2.041e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(dbp.mod2, dbp.mod3, test='Chisq')
## Analysis of Variance Table
##
## Model 1: DBP ~ I(Age - 50) * sex + Potassium
## Model 2: DBP ~ I(Age - 50) * sex + pspline(Potassium, df = 3)
## Res.Df RSS Df Sum of Sq Pr(>Chi)
## 1 7349 763767
## 2 7340 762193 9 1574 0.08668 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
effect.Potassium1.dbp <- effect("Potassium", dbp.mod3)
## NOTE: Potassium does not appear in the model
plot(effect.Potassium1.dbp, ylab="DBP", ylim=c(68,82))
This time the nonlinearity is not statistically significant (barely, but compare to the low p values earlier). Looking at the plot the primary departure from nonlinearity is at serum K values over about 5.2, but from the rug plot we see there are relatively few subjects like that and the confidence intervals are large.
No conclusions should be drawn from these plots given that the RBC K data was generated randomly. They are meant to be illustrative of possible plots for presenting the data.
invisible( # suppress id output
scatterplot(Potassium ~ RBCK, data=x,
id.method="mahal", id.n=10, labels=x$UniqueID,
xlab="RBC K (mEq/L)", ylab="Serum K (mEq/L)",
main="Correspondence between Serum and RBC Potassium")
)
invisible( # suppress id output
scatterplot(Potassium ~ RBCK | Hypertensive, data=x,
id.method="mahal", id.n=10, labels=x$UniqueID,
legend.coords = "topright",
xlab="RBC K (mEq/L)", ylab="Serum K (mEq/L)",
main="Correspondence between Serum and RBC Potassium by Hypertensive Status")
)
Potassium <- seq(2.3, 5.7, 0.1)
RBCK <- seq(60, 120, 5)
grd <- expand.grid(RBCK, Potassium)
colnames(grd) <- c("RBCK", "Potassium")
grd$E_k <- -27 * log(grd$RBCK / grd$Potassium) # mV
contourplot(E_k ~ RBCK * Potassium, grd, cuts=10,
main="Resting Membrane Potential Due to Potassium (mV)",
xlab="RBC K (mEq/L)", ylab="Serum K (mEq/L)",
xlim=c(70,110), ylim=c(2.5,5.5),
panel = function(...) {
panel.contourplot(...)
panel.abline(h=4.2,lty=2)
panel.abline(v=90,lty=2)
},
colorkey = FALSE, region = TRUE)
trellis.focus("panel", 1, 1, highlight=FALSE)
invisible(lpoints(x$RBCK, x$Potassium, col=as.numeric(x$Hypertensive)))
#invisible(lpoints(x$RBCK, x$Potassium, pch=19, col=as.numeric(x$Hypertensive)))
#ltext(x$RBCK, x$Potassium, labels=x$UniqueID, adj=c(0,0))
trellis.unfocus()
boxplot(E_k ~ Hypertensive, data=x, ylab="Membrane Voltage (E_k, mV)",
xlab="Hypertensive Status",
main="Membrane Voltage by Hypertensive Status")
Calculate whole blood potassium (\(K_{wb}\)) and explore the data. There are two versions of the calculations. Our goals are:
See A Modified Indirect Method for Determining Erythrocyte Sodium and Potassium Concentrations for discussion of the more accurate equation. Note some approximations here (plasma vs serum, total packed cell volume vs hematocrit).
x <- transform(x,
WBK1 = RBCK * Hematocrit / 100,
WBK2 = RBCK * Hematocrit / 100 + (1 - Hematocrit / 100) * Potassium)
summary(x[,c("WBK1", "WBK2")])
## WBK1 WBK2
## Min. :14.53 Min. :18.55
## 1st Qu.:33.97 1st Qu.:36.39
## Median :37.38 Median :39.73
## Mean :37.50 Mean :39.87
## 3rd Qu.:40.97 3rd Qu.:43.26
## Max. :56.45 Max. :58.32
## NA's :371 NA's :593
Look at the difference of the two variables in plots to determine the magnitude of the difference between these two calculations.
invisible( # suppress id output
scatterplot(WBK2 - WBK1 ~ WBK2, data=x,
id.method="y", id.n=10, labels=x$UniqueID,
xlab="WBK2", ylab="WBK2 - WBK1",
main="Comparison of Whole Blood K Calculations")
)
No conclusions can be drawn from the generated RBC K data…
Here we just show a very simple survival model using the Hypertensive variable along with age and sex as predictors.
We use a Cox proportional hazards model. For more on this technique see this link.
Survival analysis requires information on mortality status and survival time (here mortstat and permth_exm) so will not be done in our initial work. This is an example of a possible future direction for the membrane potential research.
#formulaText <- c("Surv(permth_exm,mortstat) ~ pspline(Age, df=8) + sex + HypertensiveF")
formulaText <- c("Surv(permth_exm,mortstat) ~ Age + sex + HypertensiveF")
# Include UniqueID to allow identifying records (e.g. outliers)
x$HypertensiveF <- as.factor(x$Hypertensive)
xsub <- x[,c("UniqueID", "Age", "sex", "racef", "HypertensiveF",
"mortstat", "permth_exm")]
xsub <- xsub[complete.cases(xsub),]
survivalMod <- coxph(as.formula(formulaText), data=xsub)
summary(survivalMod)
## Call:
## coxph(formula = as.formula(formulaText), data = xsub)
##
## n= 7857, number of events= 2006
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.082198 1.085671 0.001732 47.447 < 2e-16 ***
## sexFemale -0.409454 0.664013 0.044970 -9.105 < 2e-16 ***
## HypertensiveFTRUE 0.233909 1.263529 0.048575 4.815 1.47e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.086 0.9211 1.082 1.0894
## sexFemale 0.664 1.5060 0.608 0.7252
## HypertensiveFTRUE 1.264 0.7914 1.149 1.3897
##
## Concordance= 0.853 (se = 0.007 )
## Rsquare= 0.388 (max possible= 0.988 )
## Likelihood ratio test= 3854 on 3 df, p=0
## Wald test = 2736 on 3 df, p=0
## Score (logrank) test = 3935 on 3 df, p=0
anova(survivalMod)
## Analysis of Deviance Table
## Cox model: response is Surv(permth_exm, mortstat)
## Terms added sequentially (first to last)
##
## loglik Chisq Df Pr(>|Chi|)
## NULL -17521
## Age -15645 3751.594 1 < 2.2e-16 ***
## sex -15606 78.979 1 < 2.2e-16 ***
## HypertensiveF -15594 23.527 1 1.231e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
old.par <- par(no.readonly = TRUE)
par(mfrow = c(1, 3))
termplot(survivalMod, terms="Age")
termplot(survivalMod, terms="sex")
termplot(survivalMod, terms="HypertensiveF")
par(old.par)
We see (unsurprisingly) that hypertension does have a significant effect in the survival model. The coefficient is 0.23 and the relative risk is 1.26
Here is a plot of the Kaplan-Meier curves for NHANES subjects 65-70 (actual data).
survivalFit <- npsurv(Surv(permth_exm,mortstat) ~ sex + HypertensiveF,
data=xsub, subset = Age >= 65 & Age <= 70)
survivalFit
## Call: npsurv(formula = Surv(permth_exm, mortstat) ~ sex + HypertensiveF,
## data = xsub, subset = Age >= 65 & Age <= 70)
##
## n events median 0.95LCL 0.95UCL
## sex=Male, HypertensiveF=FALSE 137 68 185 158 NA
## sex=Male, HypertensiveF=TRUE 195 109 165 149 199
## sex=Female, HypertensiveF=FALSE 124 56 188 176 NA
## sex=Female, HypertensiveF=TRUE 181 80 200 181 NA
survplot(survivalFit, conf="none", n.risk=TRUE, xlab="Months")
# label.curves = list(keys = "lines") # legend overwrites at risk
Create a Cox PH model like above for the same group.
sex <- xsub$sex
#Age <- xsub$Age
HypertensiveF <- xsub$HypertensiveF
dd <- datadist(sex, HypertensiveF)
options(datadist='dd')
survivalMod1 <- cph(Surv(permth_exm,mortstat) ~ sex + HypertensiveF,
data=xsub, subset = Age >= 65 & Age <= 70, surv=TRUE)
summary(survivalMod1)
## Effects Response : Surv(permth_exm, mortstat)
##
## Factor Low High Diff. Effect S.E. Lower 0.95
## sex - Male:Female 2 1 NA 0.26989 0.11410 0.046255
## Hazard Ratio 2 1 NA 1.30980 NA 1.047300
## HypertensiveF - TRUE:FALSE 1 2 NA 0.10593 0.11568 -0.120790
## Hazard Ratio 1 2 NA 1.11170 NA 0.886220
## Upper 0.95
## 0.49353
## 1.63810
## 0.33266
## 1.39470
See the hazard ratios to get an idea of the effect. Note that the standard errors are much wider since we are looking at a small subset of the data.
Now compare the model plot to the actual data above (Hypertensive in red).
survplot(survivalMod1, sex, HypertensiveF=FALSE, xlab="Months")
survplot(survivalMod1, sex, HypertensiveF=TRUE, add=TRUE, col='red')
Antonio Delgado-Almeida has written many papers discussing various aspects of cellular potassium and hypertension. A good example is:
Assessing Cell K Physiology in Hypertensive Patients (DELGADOALMEIDA 2006)
See additional papers at http://www.researchgate.net/profile/Antonio_Delgado-Almeida
The US National Health and Nutrition Examination Survey provides similar data except for RBC K.
File originally created: Friday, June 19, 2015
File knitted: Wed Feb 14 08:31:06 2018
Cirillo, Massimo, Oscar Terradura-Vagnarelli, Mario Mancini, Alessandro Menotti, Alberto Zanchetti, and Martino Laurenzi. 2013. “Cohort Profile: The Gubbio Population Study.” International Journal of Epidemiology 43 (3). Oxford University Press (OUP): 713–20. doi:10.1093/ije/dyt025.
DELGADOALMEIDA, A. 2006. “Assessing Cell K Physiology in Hypertensive PatientsA New Clinical and Methodologic Approach.” American Journal of Hypertension 19 (4). Oxford University Press (OUP): 432–36. doi:10.1016/j.amjhyper.2005.09.013.
Trevisan, Maurizio, Vittorio Krogh, Massimo Cirillo, Martino Laurenzi, Alan Dyer, and Jeremiah Stamler. 1995. “Red Blood Cell Sodium and Potassium Concentration and Blood Pressure.” Annals of Epidemiology 5 (1). Elsevier BV: 44–51. doi:10.1016/1047-2797(94)00040-z.