Load Packages

Import Data

##  [1] "patientname"             "lastfour"               
##  [3] "patientsid"              "gender"                 
##  [5] "CysCLabDate"             "correctedage...6"       
##  [7] "LabChemResultValue...7"  "CysCeGFR"               
##  [9] "ScrLabDate"              "correctedage...10"      
## [11] "LabChemResultValue...11" "ScreGFR"                
## [13] "eGFRDifference"          "Smoking"                
## [15] "BMI"

Create eGFR Difference Groups

eGFRDifference = ScreGFR - CysCeGFR

Descriptive Statistics

##  correctedage...6      BMI           CysCeGFR          ScreGFR       
##  Min.   : 20.00   Min.   :14.14   Min.   :  7.796   Min.   :  4.627  
##  1st Qu.: 62.00   1st Qu.:27.54   1st Qu.: 23.818   1st Qu.: 38.629  
##  Median : 73.00   Median :33.51   Median : 37.938   Median : 57.287  
##  Mean   : 69.13   Mean   :34.34   Mean   : 42.954   Mean   : 62.468  
##  3rd Qu.: 78.00   3rd Qu.:40.83   3rd Qu.: 57.010   3rd Qu.: 85.361  
##  Max.   :101.00   Max.   :97.60   Max.   :121.296   Max.   :154.002  
##  eGFRDifference  
##  Min.   :-36.12  
##  1st Qu.: 10.45  
##  Median : 17.28  
##  Mean   : 19.51  
##  3rd Qu.: 28.26  
##  Max.   :101.82

Frequency of eGFR Difference Groups

## 
##   Group 1: <10 Group 2: 10-30   Group 3: >30 
##            349            950            371
## 
##   Group 1: <10 Group 2: 10-30   Group 3: >30 
##      0.2089820      0.5688623      0.2221557

Summary Table by Group

Body Mass Index (BMI) is categorized as: Underweight (< 18.5), Normal weight (18.5–24.9), Overweight (25.0–29.9), and Obese (30.0 or higher)

Characteristic Group 1: <10
N = 349
1
Group 2: 10-30
N = 950
1
Group 3: >30
N = 371
1
p-value2
gender


<0.001
    F 21 (6.0%) 107 (11%) 65 (18%)
    M 328 (94%) 843 (89%) 306 (82%)
Smoking


0.3
    CURRENT 35 (10%) 134 (14%) 54 (15%)
    FORMER 166 (48%) 446 (47%) 178 (48%)
    NEVER 148 (42%) 370 (39%) 139 (37%)
bmi_group



    Normal 22 (14%) 54 (13%) 14 (8.3%)
    Obese 87 (57%) 261 (61%) 134 (79%)
    Overweight 42 (27%) 108 (25%) 20 (12%)
    Underweight 2 (1.3%) 2 (0.5%) 1 (0.6%)
1 n (%)
2 Pearson’s Chi-squared test; NA

Paired t-test: ScreGFR vs CysCeGFR

Results: There is significance in mean difference between SCreGFR and CysCeGFR.

## 
##  Paired t-test
## 
## data:  paired_data$ScreGFR and paired_data$CysCeGFR
## t = 53.609, df = 1669, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  18.90153 20.33714
## sample estimates:
## mean difference 
##        19.61933

ANOVA: eGFR Difference by Group

Reults: At least one mean is different. In the multiple comparsions of means, all pairs of means are significantly different.

##               Df Sum Sq Mean Sq F value Pr(>F)    
## eGFRGroup      2 212917  106459    1107 <2e-16 ***
## Residuals   1667 160384      96                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = eGFRDifference ~ eGFRGroup, data = anova_data)
## 
## $eGFRGroup
##                                 diff      lwr      upr p adj
## Group 2: 10-30-Group 1: <10 12.06474 10.62450 13.50497     0
## Group 3: >30-Group 1: <10   33.50267 31.78686 35.21849     0
## Group 3: >30-Group 2: 10-30 21.43794 20.02928 22.84660     0

Compare BMI by eGFR Group

This tests if the mean BMI’s are different for 3 eGFR groups. Results: At least one mean BMI is different.

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## eGFRGroup     2   2766    1383   19.76 4.36e-09 ***
## Residuals   744  52074      70                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 923 observations deleted due to missingness

Compare Age by eGFR Group

This tests if the mean ages are different for 3 eGFR groups. Results: At least one mean age is different.

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## eGFRGroup      2   8689    4345   20.08 2.42e-09 ***
## Residuals   1667 360704     216                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Chi-square Test: Gender by eGFR Group

This tests to see if the gender and the eGFR group are associated or not. Results: Gender and eGFR group are significantly associated.

##    
##     Group 1: <10 Group 2: 10-30 Group 3: >30
##   F           21            107           65
##   M          328            843          306
## 
##  Pearson's Chi-squared test
## 
## data:  gender_table
## X-squared = 23.466, df = 2, p-value = 8.024e-06

Chi-square Test: Smoking by eGFR Group

This tests to see if the smoking and the eGFR group are associated or not. Results: Smoking and eGFR group are not significantly associated using a significance level of 5%.

##          
##           Group 1: <10 Group 2: 10-30 Group 3: >30
##   CURRENT           35            134           54
##   FORMER           166            446          178
##   NEVER            148            370          139
## 
##  Pearson's Chi-squared test
## 
## data:  smoking_table
## X-squared = 4.9614, df = 4, p-value = 0.2913

Linear Regression Model

We fit the model with outcome variable eGFRDifference against the variables: age, gender, BMI, smoking. Results: All variables are highly significant. However, we cannot rely on this model since the model fit is poor.

## 
## Call:
## lm(formula = eGFRDifference ~ correctedage...6 + gender + BMI + 
##     Smoking, data = reg_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.783  -8.359  -1.788   7.710  87.023 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.35051    4.42195   0.984 0.325513    
## correctedage...6  0.11357    0.04272   2.658 0.008025 ** 
## genderM          -6.24007    1.86541  -3.345 0.000864 ***
## BMI               0.49240    0.06717   7.331    6e-13 ***
## SmokingFORMER    -4.05322    1.64302  -2.467 0.013853 *  
## SmokingNEVER     -5.19722    1.68242  -3.089 0.002082 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.13 on 741 degrees of freedom
##   (923 observations deleted due to missingness)
## Multiple R-squared:  0.08589,    Adjusted R-squared:  0.07972 
## F-statistic: 13.92 on 5 and 741 DF,  p-value: 5.024e-13
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 4.351 4.422 0.984 0.326 -4.331 13.032
correctedage…6 0.114 0.043 2.658 0.008 0.030 0.197
genderM -6.240 1.865 -3.345 0.001 -9.902 -2.578
BMI 0.492 0.067 7.331 0.000 0.361 0.624
SmokingFORMER -4.053 1.643 -2.467 0.014 -7.279 -0.828
SmokingNEVER -5.197 1.682 -3.089 0.002 -8.500 -1.894

Multinomial Logistic Regression

In this model, the outcome variable is eGFRGroup: Group 1:<10, Group 2: 11-30, Group 3:>30.
This is different model than Linear Regression. Results: all variables are significant. The baseline model is Group 1.
We can examine the odd ratios. For instance, for every 1-unit increase in BMI, the odds of being in Group 2 rather than Group 1 are estimated to increase by a factor of exp(1.035) = 2.815, holding all other variables fixed.

## # weights:  21 (12 variable)
## initial  value 820.663380 
## iter  10 value 708.910142
## final  value 707.280693 
## converged
## Call:
## multinom(formula = eGFRGroup ~ correctedage...6 + gender + BMI + 
##     Smoking, data = multi_data)
## 
## Coefficients:
##                (Intercept) correctedage...6    genderM        BMI SmokingFORMER
## Group 2: 10-30 -0.05562948      0.013359042 -0.4626742 0.03450049    -0.6398211
## Group 3: >30   -1.59626203      0.005026133 -1.0057590 0.08139553    -0.5344605
##                SmokingNEVER
## Group 2: 10-30   -0.6461488
## Group 3: >30     -0.8028115
## 
## Std. Errors:
##                (Intercept) correctedage...6   genderM        BMI SmokingFORMER
## Group 2: 10-30   0.8723055      0.007873574 0.4035706 0.01370456     0.3323751
## Group 3: >30     1.0051243      0.009272391 0.4322174 0.01591294     0.3904649
##                SmokingNEVER
## Group 2: 10-30    0.3392302
## Group 3: >30      0.3999041
## 
## Residual Deviance: 1414.561 
## AIC: 1438.561
y.level term estimate std.error statistic p.value conf.low conf.high
Group 2: 10-30 (Intercept) 0.946 0.872 -0.064 0.949 0.171 5.228
Group 2: 10-30 correctedage…6 1.013 0.008 1.697 0.090 0.998 1.029
Group 2: 10-30 genderM 0.630 0.404 -1.146 0.252 0.285 1.389
Group 2: 10-30 BMI 1.035 0.014 2.517 0.012 1.008 1.063
Group 2: 10-30 SmokingFORMER 0.527 0.332 -1.925 0.054 0.275 1.012
Group 2: 10-30 SmokingNEVER 0.524 0.339 -1.905 0.057 0.270 1.019
Group 3: >30 (Intercept) 0.203 1.005 -1.588 0.112 0.028 1.453
Group 3: >30 correctedage…6 1.005 0.009 0.542 0.588 0.987 1.023
Group 3: >30 genderM 0.366 0.432 -2.327 0.020 0.157 0.853
Group 3: >30 BMI 1.085 0.016 5.115 0.000 1.051 1.119
Group 3: >30 SmokingFORMER 0.586 0.390 -1.369 0.171 0.273 1.260
Group 3: >30 SmokingNEVER 0.448 0.400 -2.008 0.045 0.205 0.981

Scatterplot: ScreGFR vs CysCeGFR

Boxplot: BMI by eGFR Group

Boxplot: Age by eGFR Group

Histogram of eGFR Difference