Load Packages
Import Data
Create eGFR Difference Groups
Descriptive Statistics
Frequency of eGFR Difference Groups
Summary Table by Group
Paired t-test: ScreGFR vs CysCeGFR
ANOVA: eGFR Difference by Group
Compare BMI by eGFR Group
Compare Age by eGFR Group
Chi-square Test: Gender by eGFR Group
Chi-square Test: Smoking by eGFR Group
Linear Regression Model
Multinomial Logistic Regression
Scatterplot: ScreGFR vs CysCeGFR
Boxplot: BMI by eGFR Group
Boxplot: Age by eGFR Group
Histogram of eGFR Difference
Summary

Load Packages

Import Data

##  [1] "patientname"             "lastfour"               
##  [3] "patientsid"              "gender"                 
##  [5] "CysCLabDate"             "correctedage...6"       
##  [7] "LabChemResultValue...7"  "CysCeGFR"               
##  [9] "ScrLabDate"              "correctedage...10"      
## [11] "LabChemResultValue...11" "ScreGFR"                
## [13] "eGFRDifference"          "Smoking"                
## [15] "BMI"

Create eGFR Difference Groups

eGFRDifference = ScreGFR - CysCeGFR

Descriptive Statistics

##  correctedage...6      BMI           CysCeGFR          ScreGFR       
##  Min.   : 20.00   Min.   :14.14   Min.   :  7.796   Min.   :  4.627  
##  1st Qu.: 62.00   1st Qu.:27.54   1st Qu.: 23.818   1st Qu.: 38.629  
##  Median : 73.00   Median :33.51   Median : 37.938   Median : 57.287  
##  Mean   : 69.13   Mean   :34.34   Mean   : 42.954   Mean   : 62.468  
##  3rd Qu.: 78.00   3rd Qu.:40.83   3rd Qu.: 57.010   3rd Qu.: 85.361  
##  Max.   :101.00   Max.   :97.60   Max.   :121.296   Max.   :154.002  
##  eGFRDifference  
##  Min.   :-36.12  
##  1st Qu.: 10.45  
##  Median : 17.28  
##  Mean   : 19.51  
##  3rd Qu.: 28.26  
##  Max.   :101.82

Frequency of eGFR Difference Groups

## 
##   Group 1: <10 Group 2: 11-30   Group 3: >30 
##            391            866            364

## 
##   Group 1: <10 Group 2: 11-30   Group 3: >30 
##       24.12091       53.42381       22.45527

Summary Table by Group

Body Mass Index (BMI) is categorized as: Underweight (< 18.5), Normal weight (18.5–24.9), Overweight (25.0–29.9), and Obese (30.0 or higher)

Characteristic	Group 1: <10 N = 172¹	Group 2: 11-30 N = 381¹	Group 3: >30 N = 165¹	p-value²
gender				0.003
F	10 (5.8%)	33 (8.7%)	27 (16%)
M	162 (94%)	348 (91%)	138 (84%)
Smoking				0.3
CURRENT	14 (8.1%)	56 (15%)	23 (14%)
FORMER	89 (52%)	185 (49%)	79 (48%)
NEVER	69 (40%)	140 (37%)	63 (38%)
bmi_group
Normal	27 (16%)	42 (11%)	14 (8.5%)
Obese	97 (56%)	239 (63%)	130 (79%)
Overweight	46 (27%)	99 (26%)	20 (12%)
Underweight	2 (1.2%)	1 (0.3%)	1 (0.6%)
¹ n (%)
² Pearson’s Chi-squared test; NA

Paired t-test: ScreGFR vs CysCeGFR

Results: There is significance in mean difference between SCreGFR and CysCeGFR.

## 
##  Paired t-test
## 
## data:  paired_data$ScreGFR and paired_data$CysCeGFR
## t = 53.609, df = 1669, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  18.90153 20.33714
## sample estimates:
## mean difference 
##        19.61933

ANOVA: eGFR Difference by Group

Reults: At least one mean is different. In the multiple comparsions of means, all pairs of means are significantly different.

##               Df Sum Sq Mean Sq F value Pr(>F)    
## eGFRGroup      2 269938  134969    2203 <2e-16 ***
## Residuals   1618  99150      61                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = eGFRDifference ~ eGFRGroup, data = anova_data)
## 
## $eGFRGroup
##                                 diff      lwr      upr p adj
## Group 2: 11-30-Group 1: <10 16.68523 15.56636 17.80410     0
## Group 3: >30-Group 1: <10   37.77532 36.43782 39.11282     0
## Group 3: >30-Group 2: 11-30 21.09009 19.94299 22.23719     0

Compare BMI by eGFR Group

This tests if the mean BMI’s are different for 3 eGFR groups. Results: At least one mean BMI is different.

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## eGFRGroup     2   2910  1454.8   20.81 1.65e-09 ***
## Residuals   715  49988    69.9                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Compare Age by eGFR Group

This tests if the mean ages are different for 3 eGFR groups. Results: At least one mean age is different.

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## eGFRGroup      2   6781    3390   15.49 2.17e-07 ***
## Residuals   1618 354104     219                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Chi-square Test: Gender by eGFR Group

This tests to see if the gender and the eGFR group are associated or not. Results: Gender and eGFR group are significantly associated.

##    
##     Group 1: <10 Group 2: 11-30 Group 3: >30
##   F           24            100           65
##   M          367            766          299

## 
##  Pearson's Chi-squared test
## 
## data:  gender_table
## X-squared = 25.158, df = 2, p-value = 3.444e-06

Chi-square Test: Smoking by eGFR Group

This tests to see if the smoking and the eGFR group are associated or not. Results: Smoking and eGFR group are not significantly associated using a significance level of 5%.

##          
##           Group 1: <10 Group 2: 11-30 Group 3: >30
##   CURRENT           38            127           53
##   FORMER           181            411          175
##   NEVER            172            328          136

## 
##  Pearson's Chi-squared test
## 
## data:  smoking_table
## X-squared = 8.4538, df = 4, p-value = 0.0763

Linear Regression Model

We fit the model with outcome variable eGFRDifference against the variables: age, gender, BMI, smoking. Results: All variables are highly significant. However, we cannot rely on this model since the model fit is poor.

## 
## Call:
## lm(formula = eGFRDifference ~ correctedage...6 + gender + BMI + 
##     Smoking, data = reg_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.783  -8.359  -1.788   7.710  87.023 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.35051    4.42195   0.984 0.325513    
## correctedage...6  0.11357    0.04272   2.658 0.008025 ** 
## genderM          -6.24007    1.86541  -3.345 0.000864 ***
## BMI               0.49240    0.06717   7.331    6e-13 ***
## SmokingFORMER    -4.05322    1.64302  -2.467 0.013853 *  
## SmokingNEVER     -5.19722    1.68242  -3.089 0.002082 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.13 on 741 degrees of freedom
## Multiple R-squared:  0.08589,    Adjusted R-squared:  0.07972 
## F-statistic: 13.92 on 5 and 741 DF,  p-value: 5.024e-13

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	4.351	4.422	0.984	0.326	-4.331	13.032
correctedage…6	0.114	0.043	2.658	0.008	0.030	0.197
genderM	-6.240	1.865	-3.345	0.001	-9.902	-2.578
BMI	0.492	0.067	7.331	0.000	0.361	0.624
SmokingFORMER	-4.053	1.643	-2.467	0.014	-7.279	-0.828
SmokingNEVER	-5.197	1.682	-3.089	0.002	-8.500	-1.894

Multinomial Logistic Regression

In this model, the outcome variable is eGFRGroup: Group 1:<10, Group 2: 11-30, Group 3:>30.
This is different model than Linear Regression. Results: all variables are significant. The baseline model is Group 1.
We can examine the odd ratios. For instance, for every 1-unit increase in BMI, the odds of being in Group 2 rather than Group 1 are estimated to increase by a factor of exp(1.062) = 2.89, holding all other variables fixed.

## # weights:  21 (12 variable)
## initial  value 788.803623 
## iter  10 value 696.704635
## final  value 694.569959 
## converged

## Call:
## multinom(formula = eGFRGroup ~ correctedage...6 + gender + BMI + 
##     Smoking, data = multi_data)
## 
## Coefficients:
##                (Intercept) correctedage...6    genderM        BMI SmokingFORMER
## Group 2: 11-30   -1.844526       0.03030899 -0.6820662 0.06058242    -0.9212411
## Group 3: >30     -2.875243       0.01877315 -1.1906104 0.09981461    -0.8150301
##                SmokingNEVER
## Group 2: 11-30   -0.9882738
## Group 3: >30     -1.0759661
## 
## Std. Errors:
##                (Intercept) correctedage...6   genderM        BMI SmokingFORMER
## Group 2: 11-30   0.8719064      0.007877506 0.3959867 0.01400878     0.3384408
## Group 3: >30     0.9946748      0.009155148 0.4207673 0.01612790     0.3930728
##                SmokingNEVER
## Group 2: 11-30    0.3446783
## Group 3: >30      0.4012563
## 
## Residual Deviance: 1389.14 
## AIC: 1413.14

y.level	term	estimate	std.error	statistic	p.value	conf.low	conf.high
Group 2: 11-30	(Intercept)	0.158	0.872	-2.116	0.034	0.029	0.873
Group 2: 11-30	correctedage…6	1.031	0.008	3.848	0.000	1.015	1.047
Group 2: 11-30	genderM	0.506	0.396	-1.722	0.085	0.233	1.099
Group 2: 11-30	BMI	1.062	0.014	4.325	0.000	1.034	1.092
Group 2: 11-30	SmokingFORMER	0.398	0.338	-2.722	0.006	0.205	0.773
Group 2: 11-30	SmokingNEVER	0.372	0.345	-2.867	0.004	0.189	0.731
Group 3: >30	(Intercept)	0.056	0.995	-2.891	0.004	0.008	0.396
Group 3: >30	correctedage…6	1.019	0.009	2.051	0.040	1.001	1.037
Group 3: >30	genderM	0.304	0.421	-2.830	0.005	0.133	0.694
Group 3: >30	BMI	1.105	0.016	6.189	0.000	1.071	1.140
Group 3: >30	SmokingFORMER	0.443	0.393	-2.073	0.038	0.205	0.956
Group 3: >30	SmokingNEVER	0.341	0.401	-2.681	0.007	0.155	0.749

Scatterplot: ScreGFR vs CysCeGFR

Boxplot: BMI by eGFR Group

Boxplot: Age by eGFR Group

Histogram of eGFR Difference

Summary

This analysis classifies patients into three groups based on the difference between creatinine-based eGFR and cystatin C-based eGFR:

Group 1: ScreGFR - CysCeGFR < 10
Group 2: ScreGFR - CysCeGFR between 11 and 30
Group 3: ScreGFR - CysCeGFR > 30

Missing values are removed separately for each analysis so that each test uses complete available data for the variables involved.

ScreGFR and CysCeGFR Analysis

Steve Chung, Ph.D

2026-05-12