1 Overview

This document specifies a set of data from the Gubbio Population Study which we think would be desirable for initial research into the utility of the Nernst potential for potassium (\(E_k\) or E_k) as a biomarker for clinical use.

\(E_k\) is calculated using the Nernst equation: \(E_k = -V_t \cdot \ln \left( \frac{[K^+]_o}{[K^+]_i} \right)\)

1.1 Gubbio Population Study

The Gubbio Study is a prospective epidemiological study on the population residing in the city of Gubbio, Italy. Original objectives of the study were the control of hypertension and the role of cellular electrolyte handling in hypertension. Other objectives were added during the 30-year activity of the study. The study is ongoing.

Table 2 of (Cirillo et al. 2013) lists variables available for a total of 5376 individuals. In brief:
“Data categories included demographics, personal and family medical history, lifestyle habits, education, type of work, anthropometry, blood pressure, pulse rate, blood biochemistry, urine biochemistry and special investigations on cellular electrolyte handling.”

The Gubbio Study has generated a number of papers. These are the most relevant for our work. A more complete list appears in the references for (Cirillo et al. 2013).

Cohort profile: The Gubbio Population Study (Cirillo et al. 2013) gives a current (2013) overview of the study.

This 1995 paper discusses RBC K in the context of hypertension and was the motivation for investigating the study.
Red blood cell sodium and potassium concentration and blood pressure. The Gubbio Population Study (Trevisan et al. 1995)

1.2 Minimal Dataset

We believe the following minimal dataset would be sufficient for the research described below.

RBC K
Serum (or plasma) K - measured at the same time as RBC K
Hematocrit - used with RBC and serum K to calculate whole blood K
Systolic blood pressure
Diastolic blood pressure
Subject sex
Subject age - if privacy is a concern, age in five year buckets would be acceptable
Hypertension drug treatment - to calculate Hypertensive variable as decribed in Table 3 of (Trevisan et al. 1995)

For consistency with the 1995 paper using the same variable for hypertensive would be desirable. I’m not sure whether the best way to achieve that is to get that variable from the study or to get the variable for hypertension drug treatment and calculate from that.

Note that Table 3 also used BMI for age adjustment. Therefore BMI would be required for a complete replication of the Table 3 results.

Note that the Table 4 models used the following variables: age, body mass index, plasma total and high density-lipoprotein cholesterol, plasma uric acid, plasma glucose, alcohol intake, red cell mean corpuscular volume, hematocrit, urinary sodium/creatinine, cigarettes/d, pulse

We are doing data analysis with R and can read a variety of data formats. CSV may be the most convenient format to make data available for us.

(Note that there is some discussion about whether hematocrit is necessary for our initial work. If there is any problem with making the hematocrit data available it can be omitted.)

2 Proposed Research

Following is the proposed research using the minimal dataset described above. The Gubbio Study contains many more variables which could be useful for other research ideas. Some possible examples are using \(E_k\) in survival analysis or studying the relationship of \(E_k\) to electrocardiogram variables like the QT interval.

The examples below are intended to illustrate some of the analyses we are hoping to do with the Gubbio data.

2.1 Compare Data with NHANES

For data which is also available in NHANES (everything except RBC K) compare the population distributions of variables between studies.

2.2 Membrane Potential Reference Ranges

Although reference ranges are available for both serum and RBC K, we are unaware of reference ranges for \(E_k\). Creation of these reference ranges and gaining an understanding of the population distribution of these values is an important initial step towards using \(E_k\) as a biomarker.

2.3 Membrane Potential and Hypertension

The basic idea is to extend Red blood cell sodium and potassium concentration and blood pressure. The Gubbio Population Study (Trevisan et al. 1995) to consider membrane potential as another potential explanatory variable.

2.4 Whole Blood Potassium

Some prior work uses whole blood potassium (here called \(K_{wb}\)) as a biomarker. It would be helpful to determine reference ranges for \(K_{wb}\) and examine the correspondence with RBC and serum K.

3 Examples

This example code uses the NHANES III data combined with randomly generated values for K_RBC. It has the following goals:

Illustrate analysis proposed above
Clarify which variables are necessary for which analyses
Allow preliminary analysis code development

3.1 NHANES III Data

The NHANES III data was chosen rather than the more recent continuous NHANES data because:

Larger number of subjects in a single uniform survey
Longer history of mortality data for survival analysis

The NHANES III data required significant preprocessing to prepare it for this analysis, but here we just load the data file created earlier. The data has been limited to adults aged 20 or older.

Note that NHANES III has a complex survey design. We are not doing any survey weighting here though. We have other analysis code which does survey weighting, but concluded that it would add unneccesary complexity to these examples.

Summary information for the variables used in this analysis. Most variable names are self explanatory, but some are not:

permth_exm - months until mortality or since exam
mortstat - 1 if the subject is deceased

variables <- c("Potassium", "Hematocrit", "SBP", "DBP", "sex", "Age", "HBPMeds", "HBPDrugClasses", "permth_exm", "mortstat")
summary(x[,variables])

##    Potassium       Hematocrit         SBP             DBP        
##  Min.   :2.670   Min.   :16.60   Min.   : 80.0   Min.   : 22.00  
##  1st Qu.:3.880   1st Qu.:38.80   1st Qu.:111.0   1st Qu.: 67.00  
##  Median :4.080   Median :41.70   Median :122.0   Median : 74.00  
##  Mean   :4.092   Mean   :41.70   Mean   :125.2   Mean   : 74.12  
##  3rd Qu.:4.290   3rd Qu.:44.70   3rd Qu.:136.0   3rd Qu.: 81.00  
##  Max.   :6.200   Max.   :57.35   Max.   :237.0   Max.   :142.00  
##  NA's   :506     NA's   :371     NA's   :137     NA's   :137     
##      sex            Age         HBPMeds        HBPDrugClasses 
##  Male  :3739   Min.   :20.00   Mode :logical   Mode :logical  
##  Female:4220   1st Qu.:32.00   FALSE:6655      FALSE:6317     
##                Median :45.00   TRUE :1297      TRUE :1642     
##                Mean   :48.45   NA's :7                        
##                3rd Qu.:65.00                                  
##                Max.   :90.00                                  
##                                                               
##    permth_exm       mortstat     
##  Min.   :  0.0   Min.   :0.0000  
##  1st Qu.:149.0   1st Qu.:0.0000  
##  Median :170.0   Median :0.0000  
##  Mean   :157.7   Mean   :0.2618  
##  3rd Qu.:193.0   3rd Qu.:1.0000  
##  Max.   :217.0   Max.   :1.0000  
##  NA's   :6       NA's   :6

3.2 Create Hypertensive Variable

We use the criteria of taking high blood pressure medications OR SBP >= 140 OR DBP >= 90.

x <- transform(x,
               Hypertensive = HBPMeds | (SBP >= 140) | (DBP >= 90))
x$HypertensiveF <- as.factor(x$Hypertensive)
# x$Hypertensive <- as.factor(x$HBPMeds | (x$SBP >= 140) | (x$DBP >= 90),
#                          levels=c("Not Hypertensive", "Hypertensive"))
summary(x$Hypertensive)

##    Mode   FALSE    TRUE    NA's 
## logical    5453    2410      96

3.3 Generate RBC K Data

The only data we need which is not in NHANES III is RBC potassium. Generate placeholder data here to allow creating analysis code.

Two options were considered:

Use a single standard value (say 90)
Generate random values for a specified distribution

Option 2. was chosen because it permits readable scatterplots to be created. One should not draw any conclusions from the RBC K data presented here. We are using a distribution with a mean of 90 and a standard deviation of 8.

Also calculate the membrane potential \(E_k\) based on the generated data.

x$RBCK <- rnorm(n=nrow(x), mean=90, sd=8)
x$E_k <- -27 * log(x$RBCK / x$Potassium) # mV
summary(x[,c("RBCK", "E_k")])

##       RBCK             E_k        
##  Min.   : 58.20   Min.   :-97.22  
##  1st Qu.: 84.51   1st Qu.:-85.62  
##  Median : 90.00   Median :-83.49  
##  Mean   : 89.94   Mean   :-83.42  
##  3rd Qu.: 95.32   3rd Qu.:-81.22  
##  Max.   :121.94   Max.   :-70.97  
##                   NA's   :506

3.4 Analysis Code

Here we load the analysis code used in these examples. It is not shown here but can be viewed at To Be Implemented

3.5 Exploratory Data Analysis

Examine the raw data. This is comparable to Compare Data with NHANES above.

3.6 Serum Potassium Reference Ranges

This is comparable to Membrane Potential Reference Ranges above, but we use serum K because that is real data.

First look at a conditional density plot by sex.

cdp(x$sex, x$Potassium,
    main = "Serum Potassium by Sex", xlab = "Serum K (mEq/L)",
    lx = "topleft")

We see a difference. Try to assess the magnitude.

boxplot(Potassium ~ sex, data=x,
        ylab="Serum K (mEq/L)", main = "Serum Potassium by Sex")

useQuantiles <- c(0.01, 0.05, 0.25, 0.5, 0.75, 0.95, 0.99)
qall <- quantile(x$Potassium, useQuantiles, na.rm=TRUE)
qmen <- quantile(x$Potassium[x$sex == "Male"], useQuantiles, na.rm=TRUE)
qwomen <- quantile(x$Potassium[x$sex == "Female"], useQuantiles, na.rm=TRUE)
quantiles <- round(data.frame(All = qall, Men = qmen, Women = qwomen), 2)
quantiles

##      All  Men Women
## 1%  3.26 3.29  3.22
## 5%  3.57 3.60  3.55
## 25% 3.88 3.92  3.85
## 50% 4.08 4.13  4.05
## 75% 4.29 4.35  4.24
## 95% 4.66 4.72  4.58
## 99% 4.99 5.03  4.91

Check for statistical significance of the difference in male and female mean values.

result <- t.test(Potassium ~ sex, x)
result

## 
##  Welch Two Sample t-test
## 
## data:  Potassium by sex
## t = 11.74, df = 7209.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.07641942 0.10705476
## sample estimates:
##   mean in group Male mean in group Female 
##             4.140480             4.048743

We see that the difference in means is statistically significant but is not that large at slightly less than 0.1.

Look at variation by age.

scatterplot(Potassium ~ Age, data=x,
            xlab="Age", ylab="Serum K (mEq/L)",
            main="Variation of Serum Potassium with Age")

Look at variation by age and sex.

scatterplot(Potassium ~ Age | sex, data=x,
            legend.coords = "topright",
            xlab="Age", ylab="Serum K (mEq/L)", ylim=c(3.0, 5.3),
            main="Variation of Serum Potassium with Age and Sex")

We see men have higher serum potassium than women (on average) and both tend to increase slightly with age. Check for statistical significance using a linear regression model.

K.mod1 <- lm(Potassium ~ Age + sex, data=x)
summary(K.mod1)

## 
## Call:
## lm(formula = Potassium ~ Age + sex, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.40223 -0.20149 -0.00515  0.20041  2.08064 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.0538188  0.0112999 358.749   <2e-16 ***
## Age          0.0017715  0.0002001   8.851   <2e-16 ***
## sexFemale   -0.0896454  0.0077484 -11.569   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3337 on 7450 degrees of freedom
##   (506 observations deleted due to missingness)
## Multiple R-squared:  0.02851,    Adjusted R-squared:  0.02825 
## F-statistic: 109.3 on 2 and 7450 DF,  p-value: < 2.2e-16

K.mod2 <- lm(Potassium ~ Age * sex, data=x)
summary(K.mod2)

## 
## Call:
## lm(formula = Potassium ~ Age * sex, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.39493 -0.20028 -0.00519  0.20018  2.08805 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.0233898  0.0153505 262.101  < 2e-16 ***
## Age            0.0023935  0.0002919   8.201 2.78e-16 ***
## sexFemale     -0.0329048  0.0208754  -1.576  0.11501    
## Age:sexFemale -0.0011732  0.0004008  -2.927  0.00343 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3335 on 7449 degrees of freedom
##   (506 observations deleted due to missingness)
## Multiple R-squared:  0.02963,    Adjusted R-squared:  0.02924 
## F-statistic: 75.81 on 3 and 7449 DF,  p-value: < 2.2e-16

anova(K.mod1, K.mod2)

## Analysis of Variance Table
## 
## Model 1: Potassium ~ Age + sex
## Model 2: Potassium ~ Age * sex
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1   7450 829.53                                
## 2   7449 828.57  1   0.95293 8.5669 0.003434 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The first model shows us the serum potassium has a statistically significant association with both age and sex. But the second model shows us that the interaction of the variables is also significant so look at the effects of the variables in more detail.

effect.Age.sex <- effect("Age:sex", K.mod2)
#summary(effect.Age.sex)
plot(effect.Age.sex, ylab="Serum K (mEq/L)")

This plot shows how age and sex affect serum potassium. The vertical offset corresponds to the sex term, the slope (Male) corresponds to the age term, and the difference in slopes corresponds to the interaction term.

By offsetting age by its approximate mean we more clearly see the significance of all variables in the model and obtain a more meaningful estimate of the sex coefficient (compare to the T test for the difference in means above). In addition the intercept becomes more meaningful as the value for a 50 year old man.

K.mod3 <- lm(Potassium ~ I(Age-50) * sex, data=x)
summary(K.mod3)

## 
## Call:
## lm(formula = Potassium ~ I(Age - 50) * sex, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.39493 -0.20028 -0.00519  0.20018  2.08805 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.1430645  0.0056471 733.668  < 2e-16 ***
## I(Age - 50)            0.0023935  0.0002919   8.201 2.78e-16 ***
## sexFemale             -0.0915629  0.0077722 -11.781  < 2e-16 ***
## I(Age - 50):sexFemale -0.0011732  0.0004008  -2.927  0.00343 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3335 on 7449 degrees of freedom
##   (506 observations deleted due to missingness)
## Multiple R-squared:  0.02963,    Adjusted R-squared:  0.02924 
## F-statistic: 75.81 on 3 and 7449 DF,  p-value: < 2.2e-16

3.7 Serum Potassium and Hypertension

Examine association between serum K and hypertension. This is comparable to Membrane Potential and Hypertension above, but we use serum K because that is real data.

boxplot(Potassium ~ Hypertensive, data=x,
        ylab="Serum K (mEq/L)", main = "Serum Potassium by Hypertension Status")

Notice that the medians are fairly similar, but the hypertensive distribution is more spread out.

scatterplot(Potassium ~ Age | Hypertensive, data=x,
            legend.coords = "topright",
            xlab="Age", ylab="Serum K (mEq/L)", ylim=c(3.0, 5.3),
            main="Variation of Serum Potassium with Age and Hypertension Status")

Non-hypertensives appear to have slightly higher serum K on average. Hopefully membrane voltage shows a better correspondence.

The most interesting thing about this plot is the colored symbols show how much hypertension increases with age.

We can look at the association of serum K with hypertension in both directions. Given the significant association between serum K and both sex and age shown above we correct for both variables and their interaction in these models.

3.7.1 Association of Serum K with Hypertension

Add hypertension to our model from above.

K.mod2a <- lm(Potassium ~ I(Age-50)*sex,
              data=x[!is.na(x$HypertensiveF),])
K.mod3 <- lm(Potassium ~ I(Age-50)*sex + HypertensiveF, data=x)
summary(K.mod3)

## 
## Call:
## lm(formula = Potassium ~ I(Age - 50) * sex + HypertensiveF, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4193 -0.2022 -0.0086  0.1962  2.0659 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.1817023  0.0064872  644.61   <2e-16 ***
## I(Age - 50)            0.0036584  0.0003125   11.71   <2e-16 ***
## sexFemale             -0.0911553  0.0077383  -11.78   <2e-16 ***
## HypertensiveFTRUE     -0.1206430  0.0098092  -12.30   <2e-16 ***
## I(Age - 50):sexFemale -0.0008333  0.0004026   -2.07   0.0385 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.33 on 7376 degrees of freedom
##   (578 observations deleted due to missingness)
## Multiple R-squared:  0.04793,    Adjusted R-squared:  0.04742 
## F-statistic: 92.84 on 4 and 7376 DF,  p-value: < 2.2e-16

anova(K.mod2a, K.mod3)

## Analysis of Variance Table
## 
## Model 1: Potassium ~ I(Age - 50) * sex
## Model 2: Potassium ~ I(Age - 50) * sex + HypertensiveF
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   7377 819.94                                  
## 2   7376 803.46  1    16.477 151.26 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that hypertensives have a serum K about 0.12 lower than normotensives and the effect is clearly significant.

Another way of looking at this is with a conditional density plot.

cdp(x$HypertensiveF, x$Potassium,
    main = "Serum Potassium by Hypertension Status", xlab = "Serum K (mEq/L)",
    lx = "topleft")

Here we see the same spreading out effect as observed in the Box plot.

The difference does not look that large until you think about the densities for K < 3.5 as ratios.

Since there are more hypertensives at both low and high serum K I think looking at the association in the other direction (below) may be more useful.

3.7.2 Association of Hypertension with Serum K

Create similar models (but now using logistic regression) for prediction of hypertension from serum K.

hyp.mod1 <- glm(HypertensiveF ~ I(Age-50)*sex,
                data=x[!is.na(x$Potassium),], family=binomial)
hyp.mod2 <- glm(HypertensiveF ~ I(Age-50)*sex + Potassium,
                data=x, family=binomial)
summary(hyp.mod2)

## 
## Call:
## glm(formula = HypertensiveF ~ I(Age - 50) * sex + Potassium, 
##     family = binomial, data = x)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2542  -0.6570  -0.3939   0.7447   2.8023  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            3.46110    0.37190   9.306  < 2e-16 ***
## I(Age - 50)            0.06266    0.00245  25.577  < 2e-16 ***
## sexFemale             -0.24197    0.06418  -3.770 0.000163 ***
## Potassium             -1.05927    0.09003 -11.766  < 2e-16 ***
## I(Age - 50):sexFemale  0.02268    0.00362   6.264 3.75e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9082.2  on 7380  degrees of freedom
## Residual deviance: 6734.2  on 7376  degrees of freedom
##   (578 observations deleted due to missingness)
## AIC: 6744.2
## 
## Number of Fisher Scoring iterations: 5

#anova(hyp.mod2)
anova(hyp.mod1, hyp.mod2)

## Analysis of Deviance Table
## 
## Model 1: HypertensiveF ~ I(Age - 50) * sex
## Model 2: HypertensiveF ~ I(Age - 50) * sex + Potassium
##   Resid. Df Resid. Dev Df Deviance
## 1      7377     6879.6            
## 2      7376     6734.2  1   145.36

Serum K is significant as a predictor of hypertension. Look at the effect in more detail. Plot the probability of hypertension at various levels of potassium.

effect.Potassium <- effect("Potassium", hyp.mod2)
plot(effect.Potassium, ylab="Prob(Hypertensive)")

This is a dramatic effect. I’m surprised we don’t see this more clearly in the scatterplot above (perhaps because of the sex adjustment?).

Check for nonlinearities in the serum K effect.

hyp.mod3 <- glm(HypertensiveF ~ I(Age-50)*sex + pspline(Potassium, df=3),
                data=x, family=binomial)
#summary(hyp.mod3)
anova(hyp.mod3, test='Chisq')

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: HypertensiveF
## 
## Terms added sequentially (first to last)
## 
## 
##                            Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                        7380     9082.2              
## I(Age - 50)                 1  2157.47      7379     6924.7 < 2.2e-16 ***
## sex                         1     0.14      7378     6924.6     0.709    
## pspline(Potassium, df = 3) 10   207.05      7368     6717.5 < 2.2e-16 ***
## I(Age - 50):sex             1    43.55      7367     6674.0 4.127e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(hyp.mod2, hyp.mod3, test='Chisq')

## Analysis of Deviance Table
## 
## Model 1: HypertensiveF ~ I(Age - 50) * sex + Potassium
## Model 2: HypertensiveF ~ I(Age - 50) * sex + pspline(Potassium, df = 3)
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      7376     6734.2                          
## 2      7367     6674.0  9   60.233 1.209e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

effect.Potassium1 <- effect("Potassium", hyp.mod3)

## NOTE: Potassium does not appear in the model

plot(effect.Potassium1, ylab="Prob(Hypertensive)", ylim=c(-3,3))

The nonlinearity is clearly significant. Notice that it accounts for over 40% as much additional deviance as the original K linear term. Based on this it should clearly be included in the model, but perhaps not with so many degrees of freedom.

I’m not sure how to best interpret this. Perhaps as a fairly linear decline in probability up to serum K of about 4.4 and then fairly indeterminate behavior (notice how large the confidence intervals are).

I think this plot is extremely interesting and am looking forward to trying something similar with RBC K and \(E_k\)!

Given these results look at the relationship of SBP and DBP with serum K. This subsection and the next two are similar to Table 4 in (Trevisan et al. 1995) except with a more limited set of variables and using serum K instead of RBC K.

3.7.3 Association of SBP with Serum K

Create similar models (but now using linear regression) for prediction of SBP from serum K.

sbp.mod1 <- lm(SBP ~ I(Age-50)*sex, data=x[!is.na(x$Potassium),])
sbp.mod2 <- lm(SBP ~ I(Age-50)*sex + Potassium, data=x)
summary(sbp.mod2)

## 
## Call:
## lm(formula = SBP ~ I(Age - 50) * sex + Potassium, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.681  -9.880  -1.489   7.839 104.584 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           144.94410    2.28163  63.527  < 2e-16 ***
## I(Age - 50)             0.47686    0.01391  34.291  < 2e-16 ***
## sexFemale              -4.40482    0.36916 -11.932  < 2e-16 ***
## Potassium              -3.93164    0.54715  -7.186 7.35e-13 ***
## I(Age - 50):sexFemale   0.24596    0.01905  12.908  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.56 on 7349 degrees of freedom
##   (605 observations deleted due to missingness)
## Multiple R-squared:  0.3759, Adjusted R-squared:  0.3756 
## F-statistic:  1107 on 4 and 7349 DF,  p-value: < 2.2e-16

anova(sbp.mod2)

## Analysis of Variance Table
## 
## Response: SBP
##                   Df  Sum Sq Mean Sq  F value    Pr(>F)    
## I(Age - 50)        1  979047  979047 4044.866 < 2.2e-16 ***
## sex                1   37864   37864  156.431 < 2.2e-16 ***
## Potassium          1   14112   14112   58.303  2.53e-14 ***
## I(Age - 50):sex    1   40331   40331  166.625 < 2.2e-16 ***
## Residuals       7349 1778802     242                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(sbp.mod1, sbp.mod2)

## Analysis of Variance Table
## 
## Model 1: SBP ~ I(Age - 50) * sex
## Model 2: SBP ~ I(Age - 50) * sex + Potassium
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   7350 1791300                                  
## 2   7349 1778802  1     12498 51.633 7.348e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Serum K is significant as a predictor of SBP with a coefficient of -3.9 per point of potassium. Look at the effect in more detail. Plot the predicted SBP at various levels of potassium.

effect.Potassium.sbp <- effect("Potassium", sbp.mod2)
plot(effect.Potassium.sbp, ylab="SBP")

This is a fairly dramatic effect.

Check for nonlinearities in the serum K effect.

sbp.mod3 <- glm(SBP ~ I(Age-50)*sex + pspline(Potassium, df=3), data=x)
#summary(sbp.mod3)
anova(sbp.mod3, test='Chisq')

## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: SBP
## 
## Terms added sequentially (first to last)
## 
## 
##                            Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                        7353    2850156              
## I(Age - 50)                 1   979047      7352    1871109 < 2.2e-16 ***
## sex                         1    37864      7351    1833245 < 2.2e-16 ***
## pspline(Potassium, df = 3) 10    21716      7341    1811529 5.318e-15 ***
## I(Age - 50):sex             1    40779      7340    1770750 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(sbp.mod2, sbp.mod3, test='Chisq')

## Analysis of Variance Table
## 
## Model 1: SBP ~ I(Age - 50) * sex + Potassium
## Model 2: SBP ~ I(Age - 50) * sex + pspline(Potassium, df = 3)
##   Res.Df     RSS Df Sum of Sq  Pr(>Chi)    
## 1   7349 1778802                           
## 2   7340 1770750  9    8051.9 0.0001148 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

effect.Potassium1.sbp <- effect("Potassium", sbp.mod3)

## NOTE: Potassium does not appear in the model

plot(effect.Potassium1.sbp, ylab="SBP", ylim=c(122,132))

We again see significant nonlinearity.

3.7.4 Association of DBP with Serum K

Create similar models (but now using linear regression) for prediction of DBP from serum K.

dbp.mod1 <- lm(DBP ~ I(Age-50)*sex, data=x[!is.na(x$Potassium),])
dbp.mod2 <- lm(DBP ~ I(Age-50)*sex + Potassium, data=x)
summary(dbp.mod2)

## 
## Call:
## lm(formula = DBP ~ I(Age - 50) * sex + Potassium, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.234  -6.477  -0.361   6.011  68.431 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           91.323450   1.495072  61.083  < 2e-16 ***
## I(Age - 50)            0.032159   0.009112   3.529 0.000419 ***
## sexFemale             -4.737393   0.241900 -19.584  < 2e-16 ***
## Potassium             -3.557167   0.358531  -9.922  < 2e-16 ***
## I(Age - 50):sexFemale  0.053070   0.012486   4.251 2.16e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.19 on 7349 degrees of freedom
##   (605 observations deleted due to missingness)
## Multiple R-squared:  0.07167,    Adjusted R-squared:  0.07116 
## F-statistic: 141.8 on 4 and 7349 DF,  p-value: < 2.2e-16

anova(dbp.mod2)

## Analysis of Variance Table
## 
## Response: DBP
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## I(Age - 50)        1   9055    9055  87.126 < 2.2e-16 ***
## sex                1  37482   37482 360.650 < 2.2e-16 ***
## Potassium          1  10548   10548 101.490 < 2.2e-16 ***
## I(Age - 50):sex    1   1878    1878  18.067 2.159e-05 ***
## Residuals       7349 763767     104                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(dbp.mod1, dbp.mod2)

## Analysis of Variance Table
## 
## Model 1: DBP ~ I(Age - 50) * sex
## Model 2: DBP ~ I(Age - 50) * sex + Potassium
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   7350 773997                                  
## 2   7349 763767  1     10230 98.436 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Serum K is significant as a predictor of DBP with a coefficient of -3.6 per point of potassium. Look at the effect in more detail. Plot the predicted DBP at various levels of potassium.

effect.Potassium.dbp <- effect("Potassium", dbp.mod2)
plot(effect.Potassium.dbp, ylab="DBP")

This is a fairly dramatic effect.

Check for nonlinearities in the serum K effect.

dbp.mod3 <- glm(DBP ~ I(Age-50)*sex + pspline(Potassium, df=3), data=x)
#summary(dbp.mod3)
anova(dbp.mod3, test='Chisq')

## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: DBP
## 
## Terms added sequentially (first to last)
## 
## 
##                            Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                        7353     822729              
## I(Age - 50)                 1     9055      7352     813674 < 2.2e-16 ***
## sex                         1    37482      7351     776192 < 2.2e-16 ***
## pspline(Potassium, df = 3) 10    12115      7341     764078 < 2.2e-16 ***
## I(Age - 50):sex             1     1885      7340     762193 2.041e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(dbp.mod2, dbp.mod3, test='Chisq')

## Analysis of Variance Table
## 
## Model 1: DBP ~ I(Age - 50) * sex + Potassium
## Model 2: DBP ~ I(Age - 50) * sex + pspline(Potassium, df = 3)
##   Res.Df    RSS Df Sum of Sq Pr(>Chi)  
## 1   7349 763767                        
## 2   7340 762193  9      1574  0.08668 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

effect.Potassium1.dbp <- effect("Potassium", dbp.mod3)

## NOTE: Potassium does not appear in the model

plot(effect.Potassium1.dbp, ylab="DBP", ylim=c(68,82))

This time the nonlinearity is not statistically significant (barely, but compare to the low p values earlier). Looking at the plot the primary departure from nonlinearity is at serum K values over about 5.2, but from the rug plot we see there are relatively few subjects like that and the confidence intervals are large.

3.8 Membrane Potential Plots

No conclusions should be drawn from these plots given that the RBC K data was generated randomly. They are meant to be illustrative of possible plots for presenting the data.

invisible( # suppress id output
scatterplot(Potassium ~ RBCK, data=x,
            id.method="mahal", id.n=10, labels=x$UniqueID,
            xlab="RBC K (mEq/L)", ylab="Serum K (mEq/L)",
            main="Correspondence between Serum and RBC Potassium")
)

invisible( # suppress id output
scatterplot(Potassium ~ RBCK | Hypertensive, data=x,
            id.method="mahal", id.n=10, labels=x$UniqueID,
            legend.coords = "topright",
            xlab="RBC K (mEq/L)", ylab="Serum K (mEq/L)",
            main="Correspondence between Serum and RBC Potassium by Hypertensive Status")
)

Potassium <- seq(2.3, 5.7, 0.1)
RBCK <- seq(60, 120, 5)
grd <- expand.grid(RBCK, Potassium)
colnames(grd) <- c("RBCK", "Potassium")
grd$E_k <- -27 * log(grd$RBCK / grd$Potassium) # mV

contourplot(E_k ~ RBCK * Potassium, grd, cuts=10,
            main="Resting Membrane Potential Due to Potassium (mV)",
            xlab="RBC K (mEq/L)", ylab="Serum K (mEq/L)",
            xlim=c(70,110), ylim=c(2.5,5.5),
            panel = function(...) {
              panel.contourplot(...)
              panel.abline(h=4.2,lty=2)
              panel.abline(v=90,lty=2)
            }, 
            colorkey = FALSE, region = TRUE)
trellis.focus("panel", 1, 1, highlight=FALSE)
invisible(lpoints(x$RBCK, x$Potassium, col=as.numeric(x$Hypertensive)))
#invisible(lpoints(x$RBCK, x$Potassium, pch=19, col=as.numeric(x$Hypertensive)))
#ltext(x$RBCK, x$Potassium, labels=x$UniqueID, adj=c(0,0))
trellis.unfocus()

boxplot(E_k ~ Hypertensive, data=x, ylab="Membrane Voltage (E_k, mV)",
        xlab="Hypertensive Status",
        main="Membrane Voltage by Hypertensive Status")

3.9 Whole Blood Potassium

Calculate whole blood potassium (\(K_{wb}\)) and explore the data. There are two versions of the calculations. Our goals are:

Determine if the additional accuracy of the more complicated equation is required.
Establish reference ranges.
See if there are systematic sex and/or age variations in the values.

See A Modified Indirect Method for Determining Erythrocyte Sodium and Potassium Concentrations for discussion of the more accurate equation. Note some approximations here (plasma vs serum, total packed cell volume vs hematocrit).

x <- transform(x,
               WBK1 = RBCK * Hematocrit / 100,
               WBK2 = RBCK * Hematocrit / 100 + (1 - Hematocrit / 100) * Potassium)
summary(x[,c("WBK1", "WBK2")])

##       WBK1            WBK2      
##  Min.   :14.53   Min.   :18.55  
##  1st Qu.:33.97   1st Qu.:36.39  
##  Median :37.38   Median :39.73  
##  Mean   :37.50   Mean   :39.87  
##  3rd Qu.:40.97   3rd Qu.:43.26  
##  Max.   :56.45   Max.   :58.32  
##  NA's   :371     NA's   :593

Look at the difference of the two variables in plots to determine the magnitude of the difference between these two calculations.

invisible( # suppress id output
scatterplot(WBK2 - WBK1 ~ WBK2, data=x,
            id.method="y", id.n=10, labels=x$UniqueID,
            xlab="WBK2", ylab="WBK2 - WBK1",
            main="Comparison of Whole Blood K Calculations")
)

No conclusions can be drawn from the generated RBC K data…

3.10 Survival Analysis

Here we just show a very simple survival model using the Hypertensive variable along with age and sex as predictors.

We use a Cox proportional hazards model. For more on this technique see this link.

Survival analysis requires information on mortality status and survival time (here mortstat and permth_exm) so will not be done in our initial work. This is an example of a possible future direction for the membrane potential research.

#formulaText <- c("Surv(permth_exm,mortstat) ~ pspline(Age, df=8) + sex + HypertensiveF")
formulaText <- c("Surv(permth_exm,mortstat) ~ Age + sex + HypertensiveF")

# Include UniqueID to allow identifying records (e.g. outliers)
x$HypertensiveF <- as.factor(x$Hypertensive)
xsub <- x[,c("UniqueID", "Age", "sex", "racef", "HypertensiveF",
             "mortstat", "permth_exm")]
xsub <- xsub[complete.cases(xsub),]

survivalMod <- coxph(as.formula(formulaText), data=xsub)
summary(survivalMod)

## Call:
## coxph(formula = as.formula(formulaText), data = xsub)
## 
##   n= 7857, number of events= 2006 
## 
##                        coef exp(coef)  se(coef)      z Pr(>|z|)    
## Age                0.082198  1.085671  0.001732 47.447  < 2e-16 ***
## sexFemale         -0.409454  0.664013  0.044970 -9.105  < 2e-16 ***
## HypertensiveFTRUE  0.233909  1.263529  0.048575  4.815 1.47e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                   exp(coef) exp(-coef) lower .95 upper .95
## Age                   1.086     0.9211     1.082    1.0894
## sexFemale             0.664     1.5060     0.608    0.7252
## HypertensiveFTRUE     1.264     0.7914     1.149    1.3897
## 
## Concordance= 0.853  (se = 0.007 )
## Rsquare= 0.388   (max possible= 0.988 )
## Likelihood ratio test= 3854  on 3 df,   p=0
## Wald test            = 2736  on 3 df,   p=0
## Score (logrank) test = 3935  on 3 df,   p=0

anova(survivalMod)

## Analysis of Deviance Table
##  Cox model: response is Surv(permth_exm, mortstat)
## Terms added sequentially (first to last)
## 
##               loglik    Chisq Df Pr(>|Chi|)    
## NULL          -17521                           
## Age           -15645 3751.594  1  < 2.2e-16 ***
## sex           -15606   78.979  1  < 2.2e-16 ***
## HypertensiveF -15594   23.527  1  1.231e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

old.par <- par(no.readonly = TRUE)
par(mfrow = c(1, 3))
termplot(survivalMod, terms="Age")
termplot(survivalMod, terms="sex")
termplot(survivalMod, terms="HypertensiveF")

par(old.par)

We see (unsurprisingly) that hypertension does have a significant effect in the survival model. The coefficient is 0.23 and the relative risk is 1.26

Here is a plot of the Kaplan-Meier curves for NHANES subjects 65-70 (actual data).

survivalFit <- npsurv(Surv(permth_exm,mortstat) ~ sex + HypertensiveF,
                       data=xsub, subset = Age >= 65 & Age <= 70)
survivalFit

## Call: npsurv(formula = Surv(permth_exm, mortstat) ~ sex + HypertensiveF, 
##     data = xsub, subset = Age >= 65 & Age <= 70)
## 
##                                   n events median 0.95LCL 0.95UCL
## sex=Male, HypertensiveF=FALSE   137     68    185     158      NA
## sex=Male, HypertensiveF=TRUE    195    109    165     149     199
## sex=Female, HypertensiveF=FALSE 124     56    188     176      NA
## sex=Female, HypertensiveF=TRUE  181     80    200     181      NA

survplot(survivalFit, conf="none", n.risk=TRUE, xlab="Months")

#         label.curves = list(keys = "lines") # legend overwrites at risk

Create a Cox PH model like above for the same group.

sex <- xsub$sex
#Age <- xsub$Age
HypertensiveF <- xsub$HypertensiveF
dd <- datadist(sex, HypertensiveF)
options(datadist='dd')
survivalMod1 <- cph(Surv(permth_exm,mortstat) ~ sex + HypertensiveF,
                    data=xsub, subset = Age >= 65 & Age <= 70, surv=TRUE)
summary(survivalMod1)

##              Effects              Response : Surv(permth_exm, mortstat) 
## 
##  Factor                     Low High Diff. Effect  S.E.    Lower 0.95
##  sex - Male:Female          2   1    NA    0.26989 0.11410  0.046255 
##   Hazard Ratio              2   1    NA    1.30980      NA  1.047300 
##  HypertensiveF - TRUE:FALSE 1   2    NA    0.10593 0.11568 -0.120790 
##   Hazard Ratio              1   2    NA    1.11170      NA  0.886220 
##  Upper 0.95
##  0.49353   
##  1.63810   
##  0.33266   
##  1.39470

See the hazard ratios to get an idea of the effect. Note that the standard errors are much wider since we are looking at a small subset of the data.

Now compare the model plot to the actual data above (Hypertensive in red).

survplot(survivalMod1, sex, HypertensiveF=FALSE, xlab="Months")
survplot(survivalMod1, sex, HypertensiveF=TRUE, add=TRUE, col='red')

4 Additional Research Literature and Data

Antonio Delgado-Almeida has written many papers discussing various aspects of cellular potassium and hypertension. A good example is:
Assessing Cell K Physiology in Hypertensive Patients (DELGADOALMEIDA 2006)
See additional papers at http://www.researchgate.net/profile/Antonio_Delgado-Almeida

The US National Health and Nutrition Examination Survey provides similar data except for RBC K.

File originally created: Friday, June 19, 2015
File knitted: Wed Feb 14 08:31:06 2018

Bibliography

Cirillo, Massimo, Oscar Terradura-Vagnarelli, Mario Mancini, Alessandro Menotti, Alberto Zanchetti, and Martino Laurenzi. 2013. “Cohort Profile: The Gubbio Population Study.” International Journal of Epidemiology 43 (3). Oxford University Press (OUP): 713–20. doi:10.1093/ije/dyt025.

DELGADOALMEIDA, A. 2006. “Assessing Cell K Physiology in Hypertensive PatientsA New Clinical and Methodologic Approach.” American Journal of Hypertension 19 (4). Oxford University Press (OUP): 432–36. doi:10.1016/j.amjhyper.2005.09.013.

Trevisan, Maurizio, Vittorio Krogh, Massimo Cirillo, Martino Laurenzi, Alan Dyer, and Jeremiah Stamler. 1995. “Red Blood Cell Sodium and Potassium Concentration and Blood Pressure.” Annals of Epidemiology 5 (1). Elsevier BV: 44–51. doi:10.1016/1047-2797(94)00040-z.

Gubbio Population Study Analysis Proposal

Richard Seiter, Michael McEvoy, and Lynne August MD

Wednesday, February 14, 2018

1 Overview

1.1 Gubbio Population Study

1.2 Minimal Dataset

2 Proposed Research

2.1 Compare Data with NHANES

2.2 Membrane Potential Reference Ranges

2.3 Membrane Potential and Hypertension

2.4 Whole Blood Potassium

3 Examples

3.1 NHANES III Data

3.2 Create Hypertensive Variable

3.3 Generate RBC K Data

3.4 Analysis Code

3.5 Exploratory Data Analysis

3.6 Serum Potassium Reference Ranges

3.7 Serum Potassium and Hypertension

3.7.1 Association of Serum K with Hypertension

3.7.2 Association of Hypertension with Serum K

3.7.3 Association of SBP with Serum K

3.7.4 Association of DBP with Serum K

3.8 Membrane Potential Plots

3.9 Whole Blood Potassium

3.10 Survival Analysis

4 Additional Research Literature and Data

Bibliography