CHRONIC KIDNEY DISEASE ANALYSIS

Author

OJALA BRIAN OLOO

Overview

I down loaded this data set from Kaggle account for the purposes of practicing and advanced my skills in data analysis, data science and machine learning in medical health research. The aim of this project was to advanced my skills in various statistical tools (e.g. STATA, SPPSS, excel, R, Power BI) in analyzing epidemiological research data and building a dashboard using PowerBI, Excel and R shiny flex dashboard.

About the data set

The dataset provides a comprehensive collection of patient clinical information, drug exposure profiles, and drug-related biochemical characteristics to support research on the early identification of Chronic Kidney Disease (CKD). It combines real-world–style patient health indicators with detailed properties of nephrotoxic and non-nephrotoxic medications that may influence kidney function.

The data set contains:

Patient Clinical Information

Includes age, gender, blood pressure, blood urea, serum creatinine, albumin levels, random blood glucose, and health conditions such as diabetes and hypertension. These features reflect common clinical factors associated with kidney health.

Drug Exposure Profiles

Each patient was linked to a drug along with dosage and duration of use. A separate label indicates whether the drug is considered nephrotoxic related effects.

CKD Risk Classification

Each record includes a CKD risk label derived from clinical biomarkers, health conditions, and drug-related toxicity indicators.

Purpose of the Dataset

Ø To Understand how clinical and drug-specific factors together influence kidney health

Ø To Develop data-driven healthcare applications and decision-support tools

Ø To evaluate drug that are related to kidney stress

DATA ANALYSIS PLAN FOR THIS DATA SET

Data Management

Ø Handling missing data, outliers

Ø Mutate characters into factor s for categorical variables

Data Manipulation

Ø Sub setting data, filtering etc.

Ø Mutate Age to categories

Data visualization

Ø Only Bar graphs used for categorical variable

Ø Histogram and Shapiro test for studying the normality assumptions of continuous scale variables

Statistical data analysis

Descriptive statistics

Ø Frequency and percentages for qualitative variable

Ø Mean and standard deviation for normal continuous scale variables

Ø Median and Interquartile range for skewed variables

Inferential statistics

To Understand how clinical and drug-specific factors together influence kidney health. I employed

i. Bivariate Analysis – Chi-square test of association for categorical variables and Welch test

ii.Multivariate Analysis- Logistic Regression

Please Note: I fitted multiple logistic regression model to control the confounder variables instead of using Mantel-Haezel statistics.

Statistical Package used to analyze this dataset was R programming.

Why R programming

Ø Open source soft ware

Ø Simple and easy to use

Ø The epidemiological research dataset was used

PART 3: To develop data-driven healthcare applications and decision-support tools

This part, I demonstrated my skills in supervised machine learning

Type of algorithm used:

Linear regression

Logistic regression

# CLEAR WORKING SPACE
rm(list = ls(all.names = TRUE))
#========================================-
# SET WD
setwd("C:/CDK") 
#==================================-
# LOAD PACKAGES
#===================================-
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(expss)

Loading required package: maditr

To select rows from data: rows(mtcars, am==0)


Attaching package: 'maditr'

The following objects are masked from 'package:dplyr':

    between, coalesce, first, last

The following object is masked from 'package:purrr':

    transpose

The following object is masked from 'package:readr':

    cols


Attaching package: 'expss'

The following objects are masked from 'package:stringr':

    fixed, regex

The following objects are masked from 'package:dplyr':

    compute, contains, na_if, recode, vars, where

The following objects are masked from 'package:purrr':

    keep, modify, modify_if, when

The following objects are masked from 'package:tidyr':

    contains, nest

The following object is masked from 'package:ggplot2':

    vars

library(table1)


Attaching package: 'table1'

The following objects are masked from 'package:base':

    units, units<-

library(gtsummary)


Attaching package: 'gtsummary'

The following objects are masked from 'package:expss':

    contains, vars, where

library(flextable)


Attaching package: 'flextable'

The following object is masked from 'package:gtsummary':

    continuous_summary

The following object is masked from 'package:expss':

    set_caption

The following object is masked from 'package:purrr':

    compose

library(officer)
library(broom)
library(gt)


Attaching package: 'gt'

The following objects are masked from 'package:expss':

    contains, gt, tab_caption, vars, where

library(readxl)


Attaching package: 'readxl'

The following object is masked from 'package:officer':

    read_xlsx

library(writexl)
library(finalfit)
library(ggplot2)
library(e1071)


Attaching package: 'e1071'

The following object is masked from 'package:ggplot2':

    element

library(psych)


Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha

library(summarytools)

Registered S3 method overwritten by 'plyr':
  method    from  
  [.indexed table1

Attaching package: 'summarytools'

The following objects are masked from 'package:table1':

    label, label<-

The following object is masked from 'package:tibble':

    view

library(broom.helpers)


Attaching package: 'broom.helpers'

The following objects are masked from 'package:gtsummary':

    all_categorical, all_continuous, all_contrasts, all_dichotomous,
    all_interaction, all_intercepts

The following objects are masked from 'package:expss':

    contains, vars, where

#=====================================- 
# LOAD DATA SET
CDK <- read_excel("CDK.xlsx",sheet = "CDK")
#======================================-
# View Data set
#=======================================-
view(CDK)

x must either be a summarytools object created with freq(), descr(), or a list of summarytools objects created using by()

Section A: Data Processing

#1.1 DATA CLEANING 
#1.1.1 Keeping variables
CDK<-CDK|>
  select(patient_age,gender,bp_systolic,bp_diastolic,
         blood_urea,blood_glucose_random,diabetes,hypertension,
         drug_name,drug_dosage_mg,exposure_days,nephrotoxic_label,
         ckd_risk_label)
#==================================================-
CDK<-CDK|>
  mutate(
    diabetes= factor(diabetes,
                     levels = c(0,1),
                     labels = c("No","Yes"),
                     exclude = NA),
    hypertension=factor(hypertension,
                        levels = c(0,1),
                        labels = c("No","Yes"),
                        exclude = NA),
    nephrotoxic_label=factor(nephrotoxic_label,
                             levels = c(0,1),
                             labels = c("non-nephrotoxic","nephrotoxic"),
                             exclude = NA),
    ckd_risk_label=factor(ckd_risk_label,
                          levels = c(0,1,2),
                          labels = c("Low risk","Moderate risk",
                                     "High risk"),
                          exclude = NA),
    gender=factor(gender,
                  labels = c("Female","Male"),
                  exclude = NA),
    drug_name=factor(drug_name,
                     labels = c("Amphotericin-B ","Aspirin","Cisplatin",
                                "Gentamicin","Ibuprofen","Paracetamol ",
                                "Tobramycin ","Vancomycin"),
                     exclude = NA))|>
  apply_labels(
    patient_age=    "Patient Age(Years)" ,
    gender= "sex",
    bp_systolic="Systolic blood pressure(mm/Hg)",
    bp_diastolic="Diastolic blood pressure(mm/HG)",
    blood_urea= "Blood urea(mmol/L)",
    drug_dosage_mg= "Drug dosage(mg)",
    exposure_days=  "Days of exposure" ,
    drug_name="Drug Type",
    blood_glucose_random="Blood glucose",
    nephrotoxic_label=  "nephrotoxic medication" ,
    ckd_risk_label= "Risk of chronic Kidney disease ")|>
  mutate(
    Age_cat=case_when(
      patient_age>=18& patient_age<=22~1,
      patient_age>=23 & patient_age<=27~2,
      patient_age>=28 & patient_age<=32~3,
      patient_age>=33 & patient_age<=37~4,
      patient_age>=38 & patient_age<=42~5,
      patient_age>=43 & patient_age<=47~6,
      patient_age>=48 & patient_age<=52~7,
      patient_age>=53 & patient_age<=57 ~8,
      patient_age>=58 & patient_age<=62~9,
      patient_age>=63 & patient_age<=67~10,
      patient_age>=68 & patient_age<=72~11,
      patient_age>=73 & patient_age<=77~12,
      patient_age>=78 & patient_age<=82~13,
      patient_age>=83 & patient_age<=87~14,
      patient_age>=88 & patient_age<=92~15
    ))%>%
  mutate(
    Age_cat=factor(Age_cat,
                   levels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
                   labels = c("18-22","23-27","28-32","33-37",
                              "38-42","43-47","48-52","53-57","58-62",
                              "63-67","68-72","73-77","78,82","83-87",
                              "88-92")))|>
  apply_labels(
    Age_cat=    "Patients Age group(years)")
  
#==================================================-
#  Adding another column-> Grouping exposure days of drug use
CDK<-within(CDK,{
    exposure_cat<-NA
    exposure_cat[exposure_days>=1 & exposure_days<=4]<-"1-4"
    exposure_cat[exposure_days>=5 & exposure_days<=9]<-"5-10"
    exposure_cat[exposure_days>=10 & exposure_days<=14]<-"10-14"
    exposure_cat[exposure_days>=15 & exposure_days<=19]<-"15-19"
    exposure_cat[exposure_days>=20 & exposure_days<=24]<-"20-24"
    exposure_cat[exposure_days>=25 & exposure_days<=29]<-"25-29"
    exposure_cat[exposure_days>=30 & exposure_days<=34]<-"30-34"
  }) 
#=======label exposure_cat to exposure days
CDK<-apply_labels(CDK,
                  exposure_cat="exposure days")
#==========================================================-
#1.1.2 Save  Data set as CDK.RData
save(CDK,file="C:/CDK/CDK.RData")
#=================================================-

Section B: Data Visualization

# 2.0 DATA VISUALIZATION----
# 2.1 Checking the distribution of CKD with female data set----
# Filter female data set
#=============================================- 
# Explore gender
CDK%>%count(gender,sort = TRUE)

# A tibble: 2 × 2
  gender     n
  <fct>  <int>
1 Female   776
2 Male     724

CDKf<-CDK|>
  filter(gender=="Female")
#========================================-
View(CDKf)
#=========================================- 
# 2.2 Visualize categorical variables
#++++++++++++++++++++++++++++++++++++++++++
# 2.2.1 Patient Age categories-------- 
# Summarized using count()
df<-CDKf%>%
  select(Age_cat,gender)%>%
  count(Age_cat)|>
ggplot(aes(x=reorder(Age_cat,n),y=n))+
  geom_bar(stat = "identity",fill="violet",color="white")+
  geom_text(aes(label = n),hjust=1.45)+coord_flip()+
  theme_classic()+labs(x="Patients Age(Years)",y="Count",
                       title = "Female Patients Age Distribution")
df

#=============================================================-
# 2.2.2 Patients Health Condition-----
# 2.2.2.1 Proportion of female with Hypertension-----
df1<-CDKf%>%
  select(hypertension)%>%
  count(hypertension)%>%
  mutate(Percentage=n/sum(n),
         perce_label=paste0(round(Percentage*100),"%"))%>%
  ggplot(aes(x=reorder(hypertension,Percentage),
                 y=Percentage))+
  geom_bar(stat="identity",fill="pink",color="black")+
  geom_text(aes(label=perce_label),vjust=-0.25)+
  labs(x="Hypertension status",y="Percent",
       title = "% of female patient with hypertension problem")+
  scale_y_continuous(labels = scales::percent)+
  theme_bw() 
df1

#====================================================-
# 2.2.2.2 Proportion of female patients with diabetes-----
df2<-CDKf%>%
  select(diabetes)%>%
  count(diabetes)%>%
  mutate(Percentage=n/sum(n),
         perce_label=paste0(round(Percentage*100),"%"))%>%
  ggplot(aes(x=reorder(diabetes,Percentage),
             y=Percentage))+
  geom_bar(stat="identity",fill="purple",color="black")+
  geom_text(aes(label=perce_label),vjust=-0.25)+
  labs(x="Diabetes status",y="Percent",
       title = "% of female patient with diabetes problem")+
  scale_y_continuous(labels = scales::percent)+
  theme_classic() 
df2

#=================================================- 
#2.2.2.3 Proportion of female patients with diabetes-----
  df3<-CDKf%>%
  select(drug_name)%>%
  count(drug_name)%>%
  mutate(Percentage=n/sum(n),
         perce_label=paste0(round(Percentage*100),"%"))%>%
  ggplot(aes(x=reorder(drug_name,Percentage),
             y=Percentage))+
  geom_bar(stat="identity",fill="skyblue",color="black")+
  geom_text(aes(label=perce_label),vjust=-0.25)+
  labs(x="Drug type",y="Percent",
       title = "% of female patient use drug")+
  scale_y_continuous(labels = scales::percent)+
  theme_classic() 
df3

#=======================================================- 
#2.2.2.4 Proportion of exposure days----
#============================================-
Exposure<-data.frame("exposurecat"= c("1-4","5-10","10-14","15-19",
                                      "20-24","25-29"),
                   "Freq" = c(97,132,131,127,140,149),
                   "Percent" = c("12.5%","17.0%", "16.9%",
                                 "16.4%","18.0%","19.2%"))
Exposure$exposurecat<-as.factor(Exposure$exposurecat)
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Exposure|>
ggplot(aes(x =exposurecat, y = as.numeric(Freq))) + 
  geom_bar(stat = "identity", color = "black", fill = "dodgerblue1")+
  geom_text(label= with(Exposure, paste(Freq, paste0('(', Percent, ')'))), 
            vjust=-1) +
  ylim(0, 200)+
  labs(title = "Days of drug consumption by female patients",
       y="Female patients",
       x="Days of consumption drug")

#=============================================================- 
# 2.3 Visualize continuous variables
# 2.3.1 Distribution of Patient Age (Years)
#2.3.1.1: Normality Assumption
CDKf%>%
  ggplot(aes(x=patient_age))+
  geom_histogram(fill="blue",color="white")+
  theme_classic()+
  labs(title = "Age distribution of female patient",
       y="Counts",
       x= "Patients Age(Years)")

`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

# Testing normality for clarity using shapiro wilk test
shapiro.test(CDKf$patient_age)


    Shapiro-Wilk normality test

data:  CDKf$patient_age
W = 0.952, p-value = 3.319e-15

# Note: Normality assumption in age is violate
# Reason : P-value <-0.05 thus fail to reject Ho
# 2.2.3.2: Identify Outliers in patient Age----
boxplot(CDKf$patient_age,col = "violet")

#==========================================-
# 2.3.2 TEST NORMALITY  ASSUMPTION USING SHAPIRO WILK TEST----
# Please note that:The remaining continuous scale variables i---- 
#used Shapiro test----
#==========================================-
shapiro.test(CDK$bp_systolic)


    Shapiro-Wilk normality test

data:  CDK$bp_systolic
W = 0.99903, p-value = 0.6182

shapiro.test(CDK$bp_diastolic)


    Shapiro-Wilk normality test

data:  CDK$bp_diastolic
W = 0.9989, p-value = 0.4988

shapiro.test(CDK$blood_urea)


    Shapiro-Wilk normality test

data:  CDK$blood_urea
W = 0.99862, p-value = 0.2819

shapiro.test(CDK$blood_glucose_random)


    Shapiro-Wilk normality test

data:  CDK$blood_glucose_random
W = 0.99846, p-value = 0.1973

shapiro.test(CDK$drug_dosage_mg)


    Shapiro-Wilk normality test

data:  CDK$drug_dosage_mg
W = 0.95146, p-value < 2.2e-16

#===========================================-
#N/B: All the variables met normality assumption except----
#Patient age and drug dosage(mg) 
#============================================-
#2.3.2.1 Describe continuous variables----
CSV<-CDK|>
  select(patient_age,bp_diastolic,drug_dosage_mg,bp_systolic,
         blood_urea,blood_glucose_random)
describe(CSV)

                     vars    n   mean     sd median trimmed    mad   min   max
patient_age             1 1500  52.87  20.86  53.00   52.84  26.69  18.0  89.0
bp_diastolic            2 1500  84.59  12.08  84.85   84.67  12.38  48.9 123.9
drug_dosage_mg          3 1500 426.61 217.44 434.50  428.25 282.44  50.0 798.0
bp_systolic             4 1500 130.31  19.65 130.10  130.34  19.13  69.6 208.5
blood_urea              5 1500  34.56  15.17  34.70   34.53  14.83 -12.7  81.7
blood_glucose_random    6 1500 149.41  38.16 150.25  149.67  37.51  -6.9 266.6
                     range  skew kurtosis   se
patient_age           71.0  0.01    -1.22 0.54
bp_diastolic          75.0 -0.04    -0.17 0.31
drug_dosage_mg       748.0 -0.06    -1.23 5.61
bp_systolic          138.9  0.02     0.16 0.51
blood_urea            94.4  0.04     0.14 0.39
blood_glucose_random 273.5 -0.09    -0.11 0.99

Section: DESCRIPTIVE STATISTICS

# 3.1 Descriptive Statistics----
Table<-CDK|>
  select(gender,diabetes,hypertension,drug_name,exposure_cat,ckd_risk_label,
         bp_systolic,bp_diastolic,blood_urea,blood_glucose_random,
         ,exposure_days,nephrotoxic_label)

Mystat<-list(all_continuous()~"{mean} ± {sd}",
             all_categorical()~"{n} ({p})")
MyDigit<-list(all_continuous()~c(2,2),all_categorical()~c(0,2))
Table1<-Table%>%
  tbl_summary(by=nephrotoxic_label,missing = "no",statistic = Mystat,digits = MyDigit)%>%
  bold_labels()
Table1

Characteristic	non-nephrotoxic N = 578¹	nephrotoxic N = 922¹
sex
Female	286 (49.48)	490 (53.15)
Male	292 (50.52)	432 (46.85)
diabetes	253 (43.77)	402 (43.60)
hypertension	335 (57.96)	500 (54.23)
Drug Type
Amphotericin-B	0 (0.00)	169 (18.33)
Aspirin	187 (32.35)	0 (0.00)
Cisplatin	0 (0.00)	193 (20.93)
Gentamicin	0 (0.00)	185 (20.07)
Ibuprofen	211 (36.51)	0 (0.00)
Paracetamol	180 (31.14)	0 (0.00)
Tobramycin	0 (0.00)	178 (19.31)
Vancomycin	0 (0.00)	197 (21.37)
exposure days
1-4	82 (14.19)	137 (14.86)
10-14	102 (17.65)	157 (17.03)
15-19	96 (16.61)	151 (16.38)
20-24	96 (16.61)	180 (19.52)
25-29	99 (17.13)	159 (17.25)
5-10	103 (17.82)	138 (14.97)
Risk of chronic Kidney disease
Low risk	336 (58.13)	43 (4.66)
Moderate risk	222 (38.41)	506 (54.88)
High risk	20 (3.46)	373 (40.46)
Systolic blood pressure(mm/Hg)	131.12 ± 19.91	129.81 ± 19.48
Diastolic blood pressure(mm/HG)	84.62 ± 12.04	84.58 ± 12.11
Blood urea(mmol/L)	34.64 ± 14.89	34.51 ± 15.35
Blood glucose	149.33 ± 39.21	149.46 ± 37.50
Days of exposure	14.86 ± 8.35	15.24 ± 8.48
¹ n (%); Mean ± SD

#========================================================-
# Reporting patient_age and drug_dosage_mg using table1 function
Table2<-CDK%>%
  select(patient_age,drug_dosage_mg,nephrotoxic_label)
table1(~patient_age+drug_dosage_mg|nephrotoxic_label,data=Table2)

	non-nephrotoxic (N=578)	nephrotoxic (N=922)	Overall (N=1500)
Patient Age(Years)
Mean (SD)	53.5 (21.0)	52.5 (20.8)	52.9 (20.9)
Median [Min, Max]	53.0 [18.0, 89.0]	52.0 [18.0, 89.0]	53.0 [18.0, 89.0]
Drug dosage(mg)
Mean (SD)	426 (221)	427 (215)	427 (217)
Median [Min, Max]	430 [50.0, 796]	436 [51.0, 798]	435 [50.0, 798]

Section C: Test of hypothesis

# 3.2: Inferential Statistics
# 3.2.1 Bivariate Analysis 
Tab_1 <-CDK|>
  tbl_summary(
    by =nephrotoxic_label,  # Uncomment if you want group-wise summary
    statistic = list(
      all_continuous() ~ "{mean} ± {sd}",
      all_categorical() ~ "{n} ({p}%)"
    ),
    percent = "column",
    missing = "no"
  ) |>
  add_overall() |>
  add_p(pvalue_fun = ~style_pvalue(.x, digits = 2)) |>
  modify_footnote(all_stat_cols() ~ "Mean (SD)") |>
  modify_spanning_header(c("stat_1", "stat_2") ~ "**nephrotoxic**") |>
  modify_caption("Table 1:Charateristics of patient information") |>
  bold_labels() |>
  add_n() |>
  as_flex_table()
sect_properties <- prop_section(page_size = page_size(orient = "portrait"))#, width = 8.3, height = 11.7)
save_as_docx(Tab_1,path="Table2c.docx", pr_section = sect_properties)
Tab_1

			nephrotoxic
Characteristic	N	Overall N = 1,5001	non-nephrotoxic N = 5781	nephrotoxic N = 9221	p-value2
Patient Age(Years)	1,500	53 ± 21	54 ± 21	52 ± 21	0.32
sex	1,500				0.17
Female		776 (52%)	286 (49%)	490 (53%)
Male		724 (48%)	292 (51%)	432 (47%)
Systolic blood pressure(mm/Hg)	1,500	130 ± 20	131 ± 20	130 ± 19	0.19
Diastolic blood pressure(mm/HG)	1,500	85 ± 12	85 ± 12	85 ± 12	0.88
Blood urea(mmol/L)	1,500	35 ± 15	35 ± 15	35 ± 15	0.61
Blood glucose	1,500	149 ± 38	149 ± 39	149 ± 38	0.83
diabetes	1,500	655 (44%)	253 (44%)	402 (44%)	0.95
hypertension	1,500	835 (56%)	335 (58%)	500 (54%)	0.16
Drug Type	1,500				<0.001
Amphotericin-B		169 (11%)	0 (0%)	169 (18%)
Aspirin		187 (12%)	187 (32%)	0 (0%)
Cisplatin		193 (13%)	0 (0%)	193 (21%)
Gentamicin		185 (12%)	0 (0%)	185 (20%)
Ibuprofen		211 (14%)	211 (37%)	0 (0%)
Paracetamol		180 (12%)	180 (31%)	0 (0%)
Tobramycin		178 (12%)	0 (0%)	178 (19%)
Vancomycin		197 (13%)	0 (0%)	197 (21%)
Drug dosage(mg)	1,500	427 ± 217	426 ± 221	427 ± 215	0.98
Days of exposure	1,500	15 ± 8	15 ± 8	15 ± 8	0.40
Risk of chronic Kidney disease	1,500				<0.001
Low risk		379 (25%)	336 (58%)	43 (4.7%)
Moderate risk		728 (49%)	222 (38%)	506 (55%)
High risk		393 (26%)	20 (3.5%)	373 (40%)
Patients Age group(years)	1,500				0.32
18-22		118 (7.9%)	46 (8.0%)	72 (7.8%)
23-27		101 (6.7%)	35 (6.1%)	66 (7.2%)
28-32		110 (7.3%)	40 (6.9%)	70 (7.6%)
33-37		106 (7.1%)	41 (7.1%)	65 (7.0%)
38-42		103 (6.9%)	40 (6.9%)	63 (6.8%)
43-47		106 (7.1%)	31 (5.4%)	75 (8.1%)
48-52		105 (7.0%)	47 (8.1%)	58 (6.3%)
53-57		114 (7.6%)	49 (8.5%)	65 (7.0%)
58-62		80 (5.3%)	34 (5.9%)	46 (5.0%)
63-67		99 (6.6%)	34 (5.9%)	65 (7.0%)
68-72		107 (7.1%)	41 (7.1%)	66 (7.2%)
73-77		109 (7.3%)	37 (6.4%)	72 (7.8%)
78,82		117 (7.8%)	50 (8.7%)	67 (7.3%)
83-87		88 (5.9%)	32 (5.5%)	56 (6.1%)
88-92		37 (2.5%)	21 (3.6%)	16 (1.7%)
exposure days	1,500				0.60
1-4		219 (15%)	82 (14%)	137 (15%)
10-14		259 (17%)	102 (18%)	157 (17%)
15-19		247 (16%)	96 (17%)	151 (16%)
20-24		276 (18%)	96 (17%)	180 (20%)
25-29		258 (17%)	99 (17%)	159 (17%)
5-10		241 (16%)	103 (18%)	138 (15%)
1Mean (SD)
2Wilcoxon rank sum test; Pearson's Chi-squared test

#3.2 Logistic Regression----
#3.2.1 Un Adjusted Odd Ratios----
OR1<-CDK|>
  select(gender,diabetes,hypertension,exposure_cat,ckd_risk_label,
         bp_systolic,bp_diastolic,blood_urea,blood_glucose_random,
         ,exposure_days,nephrotoxic_label)|>
  tbl_uvregression(method = glm,y=nephrotoxic_label,
                   method.args = list(family=binomial()),
                   exponentiate = TRUE,pvalue_fun = ~style_pvalue(.x,digits = 3))|>
  modify_column_merge(pattern = "{estimate} ({ci})",rows = ! is.na(estimate))|>
  modify_header(estimate~"**OR(95%C.I)**")|>
  bold_labels()
OR1

Characteristic	N	OR(95%C.I)	95% CI	p-value
sex	1,500
Female		—	—
Male		0.86 (0.70, 1.06)	0.70, 1.06	0.167
diabetes	1,500
No		—	—
Yes		0.99 (0.81, 1.23)	0.81, 1.23	0.948
hypertension	1,500
No		—	—
Yes		0.86 (0.70, 1.06)	0.70, 1.06	0.157
exposure days	1,500
1-4		—	—
10-14		0.92 (0.64, 1.33)	0.64, 1.33	0.664
15-19		0.94 (0.65, 1.37)	0.65, 1.37	0.752
20-24		1.12 (0.78, 1.62)	0.78, 1.62	0.540
25-29		0.96 (0.66, 1.39)	0.66, 1.39	0.835
5-10		0.80 (0.55, 1.17)	0.55, 1.17	0.248
Risk of chronic Kidney disease	1,500
Low risk		—	—
Moderate risk		17.8 (12.6, 25.7)	12.6, 25.7	<0.001
High risk		146 (86.0, 259)	86.0, 259	<0.001
Systolic blood pressure(mm/Hg)	1,500	1.00 (0.99, 1.00)	0.99, 1.00	0.210
Diastolic blood pressure(mm/HG)	1,500	1.00 (0.99, 1.01)	0.99, 1.01	0.955
Blood urea(mmol/L)	1,500	1.00 (0.99, 1.01)	0.99, 1.01	0.874
Blood glucose	1,500	1.00 (1.00, 1.00)	1.00, 1.00	0.947
Days of exposure	1,500	1.01 (0.99, 1.02)	0.99, 1.02	0.392
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

#============================================
#3.2.2 Adjusted Odds ratio----
OR2<- glm(nephrotoxic_label~gender+diabetes+hypertension+exposure_cat+ckd_risk_label+
          bp_systolic+bp_diastolic+blood_urea+blood_glucose_random+
          exposure_days,data = CDK,family = binomial())|>
  tbl_regression(
    exponentiate = TRUE,pvalue_fun = ~style_pvalue(.x,digits = 3))|>
  modify_column_merge(pattern = "{estimate} ({ci})",rows = ! is.na(estimate))|>
  modify_header(estimate~"**OR(95%C.I)**")|>
  bold_labels()
OR2

Characteristic	OR(95%C.I)	95% CI	p-value
sex
Female	—	—
Male	0.89 (0.66, 1.20)	0.66, 1.20	0.439
diabetes
No	—	—
Yes	0.26 (0.19, 0.37)	0.19, 0.37	<0.001
hypertension
No	—	—
Yes	0.22 (0.16, 0.31)	0.16, 0.31	<0.001
exposure days
1-4	—	—
10-14	0.59 (0.18, 1.95)	0.18, 1.95	0.389
15-19	0.69 (0.13, 3.71)	0.13, 3.71	0.667
20-24	0.77 (0.08, 7.06)	0.08, 7.06	0.818
25-29	0.95 (0.06, 15.1)	0.06, 15.1	0.969
5-10	0.70 (0.33, 1.46)	0.33, 1.46	0.341
Risk of chronic Kidney disease
Low risk	—	—
Moderate risk	71.0 (44.3, 118)	44.3, 118	<0.001
High risk	1,596 (764, 3,528)	764, 3,528	<0.001
Systolic blood pressure(mm/Hg)	1.00 (0.99, 1.00)	0.99, 1.00	0.299
Diastolic blood pressure(mm/HG)	1.00 (0.99, 1.01)	0.99, 1.01	0.957
Blood urea(mmol/L)	0.95 (0.93, 0.96)	0.93, 0.96	<0.001
Blood glucose	1.00 (0.99, 1.00)	0.99, 1.00	0.405
Days of exposure	1.00 (0.90, 1.12)	0.90, 1.12	0.935
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

# Merging Tables----
Table_2<-tbl_merge(
  tbls = list(OR1,OR2),
  tab_spanner = c("**Unadjusted**","**Adjusted**")
)
Table_2

Characteristic	Unadjusted				Adjusted
Characteristic	N	OR(95%C.I)	95% CI	p-value	OR(95%C.I)	95% CI	p-value
sex	1,500
Female		—	—		—	—
Male		0.86 (0.70, 1.06)	0.70, 1.06	0.167	0.89 (0.66, 1.20)	0.66, 1.20	0.439
diabetes	1,500
No		—	—		—	—
Yes		0.99 (0.81, 1.23)	0.81, 1.23	0.948	0.26 (0.19, 0.37)	0.19, 0.37	<0.001
hypertension	1,500
No		—	—		—	—
Yes		0.86 (0.70, 1.06)	0.70, 1.06	0.157	0.22 (0.16, 0.31)	0.16, 0.31	<0.001
exposure days	1,500
1-4		—	—		—	—
10-14		0.92 (0.64, 1.33)	0.64, 1.33	0.664	0.59 (0.18, 1.95)	0.18, 1.95	0.389
15-19		0.94 (0.65, 1.37)	0.65, 1.37	0.752	0.69 (0.13, 3.71)	0.13, 3.71	0.667
20-24		1.12 (0.78, 1.62)	0.78, 1.62	0.540	0.77 (0.08, 7.06)	0.08, 7.06	0.818
25-29		0.96 (0.66, 1.39)	0.66, 1.39	0.835	0.95 (0.06, 15.1)	0.06, 15.1	0.969
5-10		0.80 (0.55, 1.17)	0.55, 1.17	0.248	0.70 (0.33, 1.46)	0.33, 1.46	0.341
Risk of chronic Kidney disease	1,500
Low risk		—	—		—	—
Moderate risk		17.8 (12.6, 25.7)	12.6, 25.7	<0.001	71.0 (44.3, 118)	44.3, 118	<0.001
High risk		146 (86.0, 259)	86.0, 259	<0.001	1,596 (764, 3,528)	764, 3,528	<0.001
Systolic blood pressure(mm/Hg)	1,500	1.00 (0.99, 1.00)	0.99, 1.00	0.210	1.00 (0.99, 1.00)	0.99, 1.00	0.299
Diastolic blood pressure(mm/HG)	1,500	1.00 (0.99, 1.01)	0.99, 1.01	0.955	1.00 (0.99, 1.01)	0.99, 1.01	0.957
Blood urea(mmol/L)	1,500	1.00 (0.99, 1.01)	0.99, 1.01	0.874	0.95 (0.93, 0.96)	0.93, 0.96	<0.001
Blood glucose	1,500	1.00 (1.00, 1.00)	1.00, 1.00	0.947	1.00 (0.99, 1.00)	0.99, 1.00	0.405
Days of exposure	1,500	1.01 (0.99, 1.02)	0.99, 1.02	0.392	1.00 (0.90, 1.12)	0.90, 1.12	0.935
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

# Reporting in word document
Table_2|>
  as_gt()|>
  gtsave(filename = "TABle_2.docx",path = "C:/CDK")
#============================================-

Section D: Forest Plot

# Forest plot
# Final fit plot 
# Outcome 
dependent <- "nephrotoxic_label"

# Explanatory variables
explanatory <- c(
  "gender",
  "diabetes",
  "hypertension",
  "exposure_cat",
  "ckd_risk_label",
  "bp_systolic",
  "bp_diastolic",
  "blood_urea",
  "blood_glucose_random",
  "exposure_days")
#3.2.3 Forest plot---- 
plot1=CDK %>% or_plot(dependent, explanatory, remove_ref = TRUE, , table_text_size = 4, title_text_size = 18,
                        dependent_label = "Predictors of chronic kidney disease", prefix = "Forest Plot: - ", suffix = "")

Waiting for profiling to be done...
Waiting for profiling to be done...
Waiting for profiling to be done...

`height` was translated to `width`.

plot1

ggsave("forest1.png", plot = plot1, width = 10, height = 6, dpi = 300)