CHRONIC KIDNEY DISEASE ANALYSIS

Author

OJALA BRIAN OLOO

Overview

I down loaded this data set from Kaggle account for the purposes of practicing and advanced my skills in data analysis, data science and machine learning in medical health research. The aim of this project was to advanced my skills in various statistical tools (e.g. STATA, SPPSS, excel, R, Power BI) in analyzing epidemiological research data and building a dashboard using PowerBI, Excel and R shiny flex dashboard.

About the data set

The dataset provides a comprehensive collection of patient clinical information, drug exposure profiles, and drug-related biochemical characteristics to support research on the early identification of Chronic Kidney Disease (CKD). It combines real-world–style patient health indicators with detailed properties of nephrotoxic and non-nephrotoxic medications that may influence kidney function.

The data set contains:

Patient Clinical Information

Includes age, gender, blood pressure, blood urea, serum creatinine, albumin levels, random blood glucose, and health conditions such as diabetes and hypertension. These features reflect common clinical factors associated with kidney health.

Drug Exposure Profiles

Each patient was linked to a drug along with dosage and duration of use. A separate label indicates whether the drug is considered nephrotoxic related effects.

 CKD Risk Classification

Each record includes a CKD risk label derived from clinical biomarkers, health conditions, and drug-related toxicity indicators.

Purpose of the Dataset

Ø To Understand how clinical and drug-specific factors together influence kidney health

Ø To Develop  data-driven healthcare applications and decision-support tools

Ø To evaluate  drug that are related  to kidney stress  

DATA ANALYSIS PLAN FOR THIS DATA SET

Data Management

Ø Handling missing data, outliers

Ø Mutate characters into factor s for categorical variables

Data Manipulation

Ø Sub setting data, filtering etc.

Ø Mutate Age to categories

Data visualization

Ø Only Bar graphs used for categorical variable

Ø Histogram and Shapiro test for studying the normality assumptions of continuous scale variables

Statistical data analysis

Descriptive statistics

Ø Frequency and percentages for qualitative variable

Ø Mean and standard deviation for normal continuous scale variables

Ø Median and  Interquartile range for  skewed variables

Inferential statistics

To Understand how clinical and drug-specific factors together influence kidney health. I employed

i. Bivariate Analysis – Chi-square test of association for categorical variables and Welch test

ii.Multivariate Analysis- Logistic Regression

Please Note: I fitted multiple logistic regression model to control the confounder variables instead of using Mantel-Haezel statistics.  

Statistical Package used to analyze this dataset was R programming.  

Why R programming

Ø Open source soft ware

Ø Simple and easy to use

Ø The epidemiological research  dataset was used

PART 3: To develop data-driven healthcare applications and decision-support tools

This part, I demonstrated my skills in supervised machine learning

Type of algorithm used:

Linear regression

Logistic regression

# CLEAR WORKING SPACE
rm(list = ls(all.names = TRUE))
#========================================-
# SET WD
setwd("C:/CDK") 
#==================================-
# LOAD PACKAGES
#===================================-
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(expss)
Loading required package: maditr

To select rows from data: rows(mtcars, am==0)


Attaching package: 'maditr'

The following objects are masked from 'package:dplyr':

    between, coalesce, first, last

The following object is masked from 'package:purrr':

    transpose

The following object is masked from 'package:readr':

    cols


Attaching package: 'expss'

The following objects are masked from 'package:stringr':

    fixed, regex

The following objects are masked from 'package:dplyr':

    compute, contains, na_if, recode, vars, where

The following objects are masked from 'package:purrr':

    keep, modify, modify_if, when

The following objects are masked from 'package:tidyr':

    contains, nest

The following object is masked from 'package:ggplot2':

    vars
library(table1)

Attaching package: 'table1'

The following objects are masked from 'package:base':

    units, units<-
library(gtsummary)

Attaching package: 'gtsummary'

The following objects are masked from 'package:expss':

    contains, vars, where
library(flextable)

Attaching package: 'flextable'

The following object is masked from 'package:gtsummary':

    continuous_summary

The following object is masked from 'package:expss':

    set_caption

The following object is masked from 'package:purrr':

    compose
library(officer)
library(broom)
library(gt)

Attaching package: 'gt'

The following objects are masked from 'package:expss':

    contains, gt, tab_caption, vars, where
library(readxl)

Attaching package: 'readxl'

The following object is masked from 'package:officer':

    read_xlsx
library(writexl)
library(finalfit)
library(ggplot2)
library(e1071)

Attaching package: 'e1071'

The following object is masked from 'package:ggplot2':

    element
library(psych)

Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha
library(summarytools)
Registered S3 method overwritten by 'plyr':
  method    from  
  [.indexed table1

Attaching package: 'summarytools'

The following objects are masked from 'package:table1':

    label, label<-

The following object is masked from 'package:tibble':

    view
library(broom.helpers)

Attaching package: 'broom.helpers'

The following objects are masked from 'package:gtsummary':

    all_categorical, all_continuous, all_contrasts, all_dichotomous,
    all_interaction, all_intercepts

The following objects are masked from 'package:expss':

    contains, vars, where
#=====================================- 
# LOAD DATA SET
CDK <- read_excel("CDK.xlsx",sheet = "CDK")
#======================================-
# View Data set
#=======================================-
view(CDK)
x must either be a summarytools object created with freq(), descr(), or a list of summarytools objects created using by()

Section A: Data Processing

#1.1 DATA CLEANING 
#1.1.1 Keeping variables
CDK<-CDK|>
  select(patient_age,gender,bp_systolic,bp_diastolic,
         blood_urea,blood_glucose_random,diabetes,hypertension,
         drug_name,drug_dosage_mg,exposure_days,nephrotoxic_label,
         ckd_risk_label)
#==================================================-
CDK<-CDK|>
  mutate(
    diabetes= factor(diabetes,
                     levels = c(0,1),
                     labels = c("No","Yes"),
                     exclude = NA),
    hypertension=factor(hypertension,
                        levels = c(0,1),
                        labels = c("No","Yes"),
                        exclude = NA),
    nephrotoxic_label=factor(nephrotoxic_label,
                             levels = c(0,1),
                             labels = c("non-nephrotoxic","nephrotoxic"),
                             exclude = NA),
    ckd_risk_label=factor(ckd_risk_label,
                          levels = c(0,1,2),
                          labels = c("Low risk","Moderate risk",
                                     "High risk"),
                          exclude = NA),
    gender=factor(gender,
                  labels = c("Female","Male"),
                  exclude = NA),
    drug_name=factor(drug_name,
                     labels = c("Amphotericin-B ","Aspirin","Cisplatin",
                                "Gentamicin","Ibuprofen","Paracetamol ",
                                "Tobramycin ","Vancomycin"),
                     exclude = NA))|>
  apply_labels(
    patient_age=    "Patient Age(Years)" ,
    gender= "sex",
    bp_systolic="Systolic blood pressure(mm/Hg)",
    bp_diastolic="Diastolic blood pressure(mm/HG)",
    blood_urea= "Blood urea(mmol/L)",
    drug_dosage_mg= "Drug dosage(mg)",
    exposure_days=  "Days of exposure" ,
    drug_name="Drug Type",
    blood_glucose_random="Blood glucose",
    nephrotoxic_label=  "nephrotoxic medication" ,
    ckd_risk_label= "Risk of chronic Kidney disease ")|>
  mutate(
    Age_cat=case_when(
      patient_age>=18& patient_age<=22~1,
      patient_age>=23 & patient_age<=27~2,
      patient_age>=28 & patient_age<=32~3,
      patient_age>=33 & patient_age<=37~4,
      patient_age>=38 & patient_age<=42~5,
      patient_age>=43 & patient_age<=47~6,
      patient_age>=48 & patient_age<=52~7,
      patient_age>=53 & patient_age<=57 ~8,
      patient_age>=58 & patient_age<=62~9,
      patient_age>=63 & patient_age<=67~10,
      patient_age>=68 & patient_age<=72~11,
      patient_age>=73 & patient_age<=77~12,
      patient_age>=78 & patient_age<=82~13,
      patient_age>=83 & patient_age<=87~14,
      patient_age>=88 & patient_age<=92~15
    ))%>%
  mutate(
    Age_cat=factor(Age_cat,
                   levels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
                   labels = c("18-22","23-27","28-32","33-37",
                              "38-42","43-47","48-52","53-57","58-62",
                              "63-67","68-72","73-77","78,82","83-87",
                              "88-92")))|>
  apply_labels(
    Age_cat=    "Patients Age group(years)")
  
#==================================================-
#  Adding another column-> Grouping exposure days of drug use
CDK<-within(CDK,{
    exposure_cat<-NA
    exposure_cat[exposure_days>=1 & exposure_days<=4]<-"1-4"
    exposure_cat[exposure_days>=5 & exposure_days<=9]<-"5-10"
    exposure_cat[exposure_days>=10 & exposure_days<=14]<-"10-14"
    exposure_cat[exposure_days>=15 & exposure_days<=19]<-"15-19"
    exposure_cat[exposure_days>=20 & exposure_days<=24]<-"20-24"
    exposure_cat[exposure_days>=25 & exposure_days<=29]<-"25-29"
    exposure_cat[exposure_days>=30 & exposure_days<=34]<-"30-34"
  }) 
#=======label exposure_cat to exposure days
CDK<-apply_labels(CDK,
                  exposure_cat="exposure days")
#==========================================================-
#1.1.2 Save  Data set as CDK.RData
save(CDK,file="C:/CDK/CDK.RData")
#=================================================- 

Section B: Data Visualization

# 2.0 DATA VISUALIZATION----
# 2.1 Checking the distribution of CKD with female data set----
# Filter female data set
#=============================================- 
# Explore gender
CDK%>%count(gender,sort = TRUE)
# A tibble: 2 × 2
  gender     n
  <fct>  <int>
1 Female   776
2 Male     724
CDKf<-CDK|>
  filter(gender=="Female")
#========================================-
View(CDKf)
#=========================================- 
# 2.2 Visualize categorical variables
#++++++++++++++++++++++++++++++++++++++++++
# 2.2.1 Patient Age categories-------- 
# Summarized using count()
df<-CDKf%>%
  select(Age_cat,gender)%>%
  count(Age_cat)|>
ggplot(aes(x=reorder(Age_cat,n),y=n))+
  geom_bar(stat = "identity",fill="violet",color="white")+
  geom_text(aes(label = n),hjust=1.45)+coord_flip()+
  theme_classic()+labs(x="Patients Age(Years)",y="Count",
                       title = "Female Patients Age Distribution")
df

#=============================================================-
# 2.2.2 Patients Health Condition-----
# 2.2.2.1 Proportion of female with Hypertension-----
df1<-CDKf%>%
  select(hypertension)%>%
  count(hypertension)%>%
  mutate(Percentage=n/sum(n),
         perce_label=paste0(round(Percentage*100),"%"))%>%
  ggplot(aes(x=reorder(hypertension,Percentage),
                 y=Percentage))+
  geom_bar(stat="identity",fill="pink",color="black")+
  geom_text(aes(label=perce_label),vjust=-0.25)+
  labs(x="Hypertension status",y="Percent",
       title = "% of female patient with hypertension problem")+
  scale_y_continuous(labels = scales::percent)+
  theme_bw() 
df1

#====================================================-
# 2.2.2.2 Proportion of female patients with diabetes-----
df2<-CDKf%>%
  select(diabetes)%>%
  count(diabetes)%>%
  mutate(Percentage=n/sum(n),
         perce_label=paste0(round(Percentage*100),"%"))%>%
  ggplot(aes(x=reorder(diabetes,Percentage),
             y=Percentage))+
  geom_bar(stat="identity",fill="purple",color="black")+
  geom_text(aes(label=perce_label),vjust=-0.25)+
  labs(x="Diabetes status",y="Percent",
       title = "% of female patient with diabetes problem")+
  scale_y_continuous(labels = scales::percent)+
  theme_classic() 
df2

#=================================================- 
#2.2.2.3 Proportion of female patients with diabetes-----
  df3<-CDKf%>%
  select(drug_name)%>%
  count(drug_name)%>%
  mutate(Percentage=n/sum(n),
         perce_label=paste0(round(Percentage*100),"%"))%>%
  ggplot(aes(x=reorder(drug_name,Percentage),
             y=Percentage))+
  geom_bar(stat="identity",fill="skyblue",color="black")+
  geom_text(aes(label=perce_label),vjust=-0.25)+
  labs(x="Drug type",y="Percent",
       title = "% of female patient use drug")+
  scale_y_continuous(labels = scales::percent)+
  theme_classic() 
df3

#=======================================================- 
#2.2.2.4 Proportion of exposure days----
#============================================-
Exposure<-data.frame("exposurecat"= c("1-4","5-10","10-14","15-19",
                                      "20-24","25-29"),
                   "Freq" = c(97,132,131,127,140,149),
                   "Percent" = c("12.5%","17.0%", "16.9%",
                                 "16.4%","18.0%","19.2%"))
Exposure$exposurecat<-as.factor(Exposure$exposurecat)
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Exposure|>
ggplot(aes(x =exposurecat, y = as.numeric(Freq))) + 
  geom_bar(stat = "identity", color = "black", fill = "dodgerblue1")+
  geom_text(label= with(Exposure, paste(Freq, paste0('(', Percent, ')'))), 
            vjust=-1) +
  ylim(0, 200)+
  labs(title = "Days of drug consumption by female patients",
       y="Female patients",
       x="Days of consumption drug")

#=============================================================- 
# 2.3 Visualize continuous variables
# 2.3.1 Distribution of Patient Age (Years)
#2.3.1.1: Normality Assumption
CDKf%>%
  ggplot(aes(x=patient_age))+
  geom_histogram(fill="blue",color="white")+
  theme_classic()+
  labs(title = "Age distribution of female patient",
       y="Counts",
       x= "Patients Age(Years)")
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# Testing normality for clarity using shapiro wilk test
shapiro.test(CDKf$patient_age)

    Shapiro-Wilk normality test

data:  CDKf$patient_age
W = 0.952, p-value = 3.319e-15
# Note: Normality assumption in age is violate
# Reason : P-value <-0.05 thus fail to reject Ho
# 2.2.3.2: Identify Outliers in patient Age----
boxplot(CDKf$patient_age,col = "violet")

#==========================================-
# 2.3.2 TEST NORMALITY  ASSUMPTION USING SHAPIRO WILK TEST----
# Please note that:The remaining continuous scale variables i---- 
#used Shapiro test----
#==========================================-
shapiro.test(CDK$bp_systolic)

    Shapiro-Wilk normality test

data:  CDK$bp_systolic
W = 0.99903, p-value = 0.6182
shapiro.test(CDK$bp_diastolic)

    Shapiro-Wilk normality test

data:  CDK$bp_diastolic
W = 0.9989, p-value = 0.4988
shapiro.test(CDK$blood_urea)

    Shapiro-Wilk normality test

data:  CDK$blood_urea
W = 0.99862, p-value = 0.2819
shapiro.test(CDK$blood_glucose_random)

    Shapiro-Wilk normality test

data:  CDK$blood_glucose_random
W = 0.99846, p-value = 0.1973
shapiro.test(CDK$drug_dosage_mg)

    Shapiro-Wilk normality test

data:  CDK$drug_dosage_mg
W = 0.95146, p-value < 2.2e-16
#===========================================-
#N/B: All the variables met normality assumption except----
#Patient age and drug dosage(mg) 
#============================================-
#2.3.2.1 Describe continuous variables----
CSV<-CDK|>
  select(patient_age,bp_diastolic,drug_dosage_mg,bp_systolic,
         blood_urea,blood_glucose_random)
describe(CSV)
                     vars    n   mean     sd median trimmed    mad   min   max
patient_age             1 1500  52.87  20.86  53.00   52.84  26.69  18.0  89.0
bp_diastolic            2 1500  84.59  12.08  84.85   84.67  12.38  48.9 123.9
drug_dosage_mg          3 1500 426.61 217.44 434.50  428.25 282.44  50.0 798.0
bp_systolic             4 1500 130.31  19.65 130.10  130.34  19.13  69.6 208.5
blood_urea              5 1500  34.56  15.17  34.70   34.53  14.83 -12.7  81.7
blood_glucose_random    6 1500 149.41  38.16 150.25  149.67  37.51  -6.9 266.6
                     range  skew kurtosis   se
patient_age           71.0  0.01    -1.22 0.54
bp_diastolic          75.0 -0.04    -0.17 0.31
drug_dosage_mg       748.0 -0.06    -1.23 5.61
bp_systolic          138.9  0.02     0.16 0.51
blood_urea            94.4  0.04     0.14 0.39
blood_glucose_random 273.5 -0.09    -0.11 0.99

Section: DESCRIPTIVE STATISTICS

# 3.1 Descriptive Statistics----
Table<-CDK|>
  select(gender,diabetes,hypertension,drug_name,exposure_cat,ckd_risk_label,
         bp_systolic,bp_diastolic,blood_urea,blood_glucose_random,
         ,exposure_days,nephrotoxic_label)

Mystat<-list(all_continuous()~"{mean} ± {sd}",
             all_categorical()~"{n} ({p})")
MyDigit<-list(all_continuous()~c(2,2),all_categorical()~c(0,2))
Table1<-Table%>%
  tbl_summary(by=nephrotoxic_label,missing = "no",statistic = Mystat,digits = MyDigit)%>%
  bold_labels()
Table1
Characteristic non-nephrotoxic
N = 5781
nephrotoxic
N = 9221
sex

    Female 286 (49.48) 490 (53.15)
    Male 292 (50.52) 432 (46.85)
diabetes 253 (43.77) 402 (43.60)
hypertension 335 (57.96) 500 (54.23)
Drug Type

    Amphotericin-B 0 (0.00) 169 (18.33)
    Aspirin 187 (32.35) 0 (0.00)
    Cisplatin 0 (0.00) 193 (20.93)
    Gentamicin 0 (0.00) 185 (20.07)
    Ibuprofen 211 (36.51) 0 (0.00)
    Paracetamol 180 (31.14) 0 (0.00)
    Tobramycin 0 (0.00) 178 (19.31)
    Vancomycin 0 (0.00) 197 (21.37)
exposure days

    1-4 82 (14.19) 137 (14.86)
    10-14 102 (17.65) 157 (17.03)
    15-19 96 (16.61) 151 (16.38)
    20-24 96 (16.61) 180 (19.52)
    25-29 99 (17.13) 159 (17.25)
    5-10 103 (17.82) 138 (14.97)
Risk of chronic Kidney disease

    Low risk 336 (58.13) 43 (4.66)
    Moderate risk 222 (38.41) 506 (54.88)
    High risk 20 (3.46) 373 (40.46)
Systolic blood pressure(mm/Hg) 131.12 ± 19.91 129.81 ± 19.48
Diastolic blood pressure(mm/HG) 84.62 ± 12.04 84.58 ± 12.11
Blood urea(mmol/L) 34.64 ± 14.89 34.51 ± 15.35
Blood glucose 149.33 ± 39.21 149.46 ± 37.50
Days of exposure 14.86 ± 8.35 15.24 ± 8.48
1 n (%); Mean ± SD
#========================================================-
# Reporting patient_age and drug_dosage_mg using table1 function
Table2<-CDK%>%
  select(patient_age,drug_dosage_mg,nephrotoxic_label)
table1(~patient_age+drug_dosage_mg|nephrotoxic_label,data=Table2)
non-nephrotoxic
(N=578)
nephrotoxic
(N=922)
Overall
(N=1500)
Patient Age(Years)
Mean (SD) 53.5 (21.0) 52.5 (20.8) 52.9 (20.9)
Median [Min, Max] 53.0 [18.0, 89.0] 52.0 [18.0, 89.0] 53.0 [18.0, 89.0]
Drug dosage(mg)
Mean (SD) 426 (221) 427 (215) 427 (217)
Median [Min, Max] 430 [50.0, 796] 436 [51.0, 798] 435 [50.0, 798]

Section C: Test of hypothesis

# 3.2: Inferential Statistics
# 3.2.1 Bivariate Analysis 
Tab_1 <-CDK|>
  tbl_summary(
    by =nephrotoxic_label,  # Uncomment if you want group-wise summary
    statistic = list(
      all_continuous() ~ "{mean} ± {sd}",
      all_categorical() ~ "{n} ({p}%)"
    ),
    percent = "column",
    missing = "no"
  ) |>
  add_overall() |>
  add_p(pvalue_fun = ~style_pvalue(.x, digits = 2)) |>
  modify_footnote(all_stat_cols() ~ "Mean (SD)") |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**nephrotoxic**") |>
  modify_caption("Table 1:Charateristics of patient information") |>
  bold_labels() |>
  add_n() |>
  as_flex_table()
sect_properties <- prop_section(page_size = page_size(orient = "portrait"))#, width = 8.3, height = 11.7)
save_as_docx(Tab_1,path="Table2c.docx", pr_section = sect_properties)
Tab_1

nephrotoxic

Characteristic

N

Overall
N = 1,5001

non-nephrotoxic
N = 5781

nephrotoxic
N = 9221

p-value2

Patient Age(Years)

1,500

53 ± 21

54 ± 21

52 ± 21

0.32

sex

1,500

0.17

Female

776 (52%)

286 (49%)

490 (53%)

Male

724 (48%)

292 (51%)

432 (47%)

Systolic blood pressure(mm/Hg)

1,500

130 ± 20

131 ± 20

130 ± 19

0.19

Diastolic blood pressure(mm/HG)

1,500

85 ± 12

85 ± 12

85 ± 12

0.88

Blood urea(mmol/L)

1,500

35 ± 15

35 ± 15

35 ± 15

0.61

Blood glucose

1,500

149 ± 38

149 ± 39

149 ± 38

0.83

diabetes

1,500

655 (44%)

253 (44%)

402 (44%)

0.95

hypertension

1,500

835 (56%)

335 (58%)

500 (54%)

0.16

Drug Type

1,500

<0.001

Amphotericin-B

169 (11%)

0 (0%)

169 (18%)

Aspirin

187 (12%)

187 (32%)

0 (0%)

Cisplatin

193 (13%)

0 (0%)

193 (21%)

Gentamicin

185 (12%)

0 (0%)

185 (20%)

Ibuprofen

211 (14%)

211 (37%)

0 (0%)

Paracetamol

180 (12%)

180 (31%)

0 (0%)

Tobramycin

178 (12%)

0 (0%)

178 (19%)

Vancomycin

197 (13%)

0 (0%)

197 (21%)

Drug dosage(mg)

1,500

427 ± 217

426 ± 221

427 ± 215

0.98

Days of exposure

1,500

15 ± 8

15 ± 8

15 ± 8

0.40

Risk of chronic Kidney disease

1,500

<0.001

Low risk

379 (25%)

336 (58%)

43 (4.7%)

Moderate risk

728 (49%)

222 (38%)

506 (55%)

High risk

393 (26%)

20 (3.5%)

373 (40%)

Patients Age group(years)

1,500

0.32

18-22

118 (7.9%)

46 (8.0%)

72 (7.8%)

23-27

101 (6.7%)

35 (6.1%)

66 (7.2%)

28-32

110 (7.3%)

40 (6.9%)

70 (7.6%)

33-37

106 (7.1%)

41 (7.1%)

65 (7.0%)

38-42

103 (6.9%)

40 (6.9%)

63 (6.8%)

43-47

106 (7.1%)

31 (5.4%)

75 (8.1%)

48-52

105 (7.0%)

47 (8.1%)

58 (6.3%)

53-57

114 (7.6%)

49 (8.5%)

65 (7.0%)

58-62

80 (5.3%)

34 (5.9%)

46 (5.0%)

63-67

99 (6.6%)

34 (5.9%)

65 (7.0%)

68-72

107 (7.1%)

41 (7.1%)

66 (7.2%)

73-77

109 (7.3%)

37 (6.4%)

72 (7.8%)

78,82

117 (7.8%)

50 (8.7%)

67 (7.3%)

83-87

88 (5.9%)

32 (5.5%)

56 (6.1%)

88-92

37 (2.5%)

21 (3.6%)

16 (1.7%)

exposure days

1,500

0.60

1-4

219 (15%)

82 (14%)

137 (15%)

10-14

259 (17%)

102 (18%)

157 (17%)

15-19

247 (16%)

96 (17%)

151 (16%)

20-24

276 (18%)

96 (17%)

180 (20%)

25-29

258 (17%)

99 (17%)

159 (17%)

5-10

241 (16%)

103 (18%)

138 (15%)

1Mean (SD)

2Wilcoxon rank sum test; Pearson's Chi-squared test

#3.2 Logistic Regression----
#3.2.1 Un Adjusted Odd Ratios----
OR1<-CDK|>
  select(gender,diabetes,hypertension,exposure_cat,ckd_risk_label,
         bp_systolic,bp_diastolic,blood_urea,blood_glucose_random,
         ,exposure_days,nephrotoxic_label)|>
  tbl_uvregression(method = glm,y=nephrotoxic_label,
                   method.args = list(family=binomial()),
                   exponentiate = TRUE,pvalue_fun = ~style_pvalue(.x,digits = 3))|>
  modify_column_merge(pattern = "{estimate} ({ci})",rows = ! is.na(estimate))|>
  modify_header(estimate~"**OR(95%C.I)**")|>
  bold_labels()
OR1
Characteristic N OR(95%C.I) 95% CI p-value
sex 1,500


    Female

    Male
0.86 (0.70, 1.06) 0.70, 1.06 0.167
diabetes 1,500


    No

    Yes
0.99 (0.81, 1.23) 0.81, 1.23 0.948
hypertension 1,500


    No

    Yes
0.86 (0.70, 1.06) 0.70, 1.06 0.157
exposure days 1,500


    1-4

    10-14
0.92 (0.64, 1.33) 0.64, 1.33 0.664
    15-19
0.94 (0.65, 1.37) 0.65, 1.37 0.752
    20-24
1.12 (0.78, 1.62) 0.78, 1.62 0.540
    25-29
0.96 (0.66, 1.39) 0.66, 1.39 0.835
    5-10
0.80 (0.55, 1.17) 0.55, 1.17 0.248
Risk of chronic Kidney disease 1,500


    Low risk

    Moderate risk
17.8 (12.6, 25.7) 12.6, 25.7 <0.001
    High risk
146 (86.0, 259) 86.0, 259 <0.001
Systolic blood pressure(mm/Hg) 1,500 1.00 (0.99, 1.00) 0.99, 1.00 0.210
Diastolic blood pressure(mm/HG) 1,500 1.00 (0.99, 1.01) 0.99, 1.01 0.955
Blood urea(mmol/L) 1,500 1.00 (0.99, 1.01) 0.99, 1.01 0.874
Blood glucose 1,500 1.00 (1.00, 1.00) 1.00, 1.00 0.947
Days of exposure 1,500 1.01 (0.99, 1.02) 0.99, 1.02 0.392
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
#============================================
#3.2.2 Adjusted Odds ratio----
OR2<- glm(nephrotoxic_label~gender+diabetes+hypertension+exposure_cat+ckd_risk_label+
          bp_systolic+bp_diastolic+blood_urea+blood_glucose_random+
          exposure_days,data = CDK,family = binomial())|>
  tbl_regression(
    exponentiate = TRUE,pvalue_fun = ~style_pvalue(.x,digits = 3))|>
  modify_column_merge(pattern = "{estimate} ({ci})",rows = ! is.na(estimate))|>
  modify_header(estimate~"**OR(95%C.I)**")|>
  bold_labels()
OR2
Characteristic OR(95%C.I) 95% CI p-value
sex


    Female
    Male 0.89 (0.66, 1.20) 0.66, 1.20 0.439
diabetes


    No
    Yes 0.26 (0.19, 0.37) 0.19, 0.37 <0.001
hypertension


    No
    Yes 0.22 (0.16, 0.31) 0.16, 0.31 <0.001
exposure days


    1-4
    10-14 0.59 (0.18, 1.95) 0.18, 1.95 0.389
    15-19 0.69 (0.13, 3.71) 0.13, 3.71 0.667
    20-24 0.77 (0.08, 7.06) 0.08, 7.06 0.818
    25-29 0.95 (0.06, 15.1) 0.06, 15.1 0.969
    5-10 0.70 (0.33, 1.46) 0.33, 1.46 0.341
Risk of chronic Kidney disease


    Low risk
    Moderate risk 71.0 (44.3, 118) 44.3, 118 <0.001
    High risk 1,596 (764, 3,528) 764, 3,528 <0.001
Systolic blood pressure(mm/Hg) 1.00 (0.99, 1.00) 0.99, 1.00 0.299
Diastolic blood pressure(mm/HG) 1.00 (0.99, 1.01) 0.99, 1.01 0.957
Blood urea(mmol/L) 0.95 (0.93, 0.96) 0.93, 0.96 <0.001
Blood glucose 1.00 (0.99, 1.00) 0.99, 1.00 0.405
Days of exposure 1.00 (0.90, 1.12) 0.90, 1.12 0.935
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
# Merging Tables----
Table_2<-tbl_merge(
  tbls = list(OR1,OR2),
  tab_spanner = c("**Unadjusted**","**Adjusted**")
)
Table_2
Characteristic
Unadjusted
Adjusted
N OR(95%C.I) 95% CI p-value OR(95%C.I) 95% CI p-value
sex 1,500





    Female


    Male
0.86 (0.70, 1.06) 0.70, 1.06 0.167 0.89 (0.66, 1.20) 0.66, 1.20 0.439
diabetes 1,500





    No


    Yes
0.99 (0.81, 1.23) 0.81, 1.23 0.948 0.26 (0.19, 0.37) 0.19, 0.37 <0.001
hypertension 1,500





    No


    Yes
0.86 (0.70, 1.06) 0.70, 1.06 0.157 0.22 (0.16, 0.31) 0.16, 0.31 <0.001
exposure days 1,500





    1-4


    10-14
0.92 (0.64, 1.33) 0.64, 1.33 0.664 0.59 (0.18, 1.95) 0.18, 1.95 0.389
    15-19
0.94 (0.65, 1.37) 0.65, 1.37 0.752 0.69 (0.13, 3.71) 0.13, 3.71 0.667
    20-24
1.12 (0.78, 1.62) 0.78, 1.62 0.540 0.77 (0.08, 7.06) 0.08, 7.06 0.818
    25-29
0.96 (0.66, 1.39) 0.66, 1.39 0.835 0.95 (0.06, 15.1) 0.06, 15.1 0.969
    5-10
0.80 (0.55, 1.17) 0.55, 1.17 0.248 0.70 (0.33, 1.46) 0.33, 1.46 0.341
Risk of chronic Kidney disease 1,500





    Low risk


    Moderate risk
17.8 (12.6, 25.7) 12.6, 25.7 <0.001 71.0 (44.3, 118) 44.3, 118 <0.001
    High risk
146 (86.0, 259) 86.0, 259 <0.001 1,596 (764, 3,528) 764, 3,528 <0.001
Systolic blood pressure(mm/Hg) 1,500 1.00 (0.99, 1.00) 0.99, 1.00 0.210 1.00 (0.99, 1.00) 0.99, 1.00 0.299
Diastolic blood pressure(mm/HG) 1,500 1.00 (0.99, 1.01) 0.99, 1.01 0.955 1.00 (0.99, 1.01) 0.99, 1.01 0.957
Blood urea(mmol/L) 1,500 1.00 (0.99, 1.01) 0.99, 1.01 0.874 0.95 (0.93, 0.96) 0.93, 0.96 <0.001
Blood glucose 1,500 1.00 (1.00, 1.00) 1.00, 1.00 0.947 1.00 (0.99, 1.00) 0.99, 1.00 0.405
Days of exposure 1,500 1.01 (0.99, 1.02) 0.99, 1.02 0.392 1.00 (0.90, 1.12) 0.90, 1.12 0.935
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
# Reporting in word document
Table_2|>
  as_gt()|>
  gtsave(filename = "TABle_2.docx",path = "C:/CDK")
#============================================-

Section D: Forest Plot

# Forest plot
# Final fit plot 
# Outcome 
dependent <- "nephrotoxic_label"

# Explanatory variables
explanatory <- c(
  "gender",
  "diabetes",
  "hypertension",
  "exposure_cat",
  "ckd_risk_label",
  "bp_systolic",
  "bp_diastolic",
  "blood_urea",
  "blood_glucose_random",
  "exposure_days")
#3.2.3 Forest plot---- 
plot1=CDK %>% or_plot(dependent, explanatory, remove_ref = TRUE, , table_text_size = 4, title_text_size = 18,
                        dependent_label = "Predictors of chronic kidney disease", prefix = "Forest Plot: - ", suffix = "")
Waiting for profiling to be done...
Waiting for profiling to be done...
Waiting for profiling to be done...
`height` was translated to `width`.
plot1

ggsave("forest1.png", plot = plot1, width = 10, height = 6, dpi = 300)