There are very few cases with year < 2017 and may have to be removed from the dataset.(Decision to be taken) There are also 935 (~43%) cases with missing year information. These two are invalid dates.
Next, we clean the diagnosis/etiology variables BasicDiagnosis, Etiologynew, broadetiologycategory and create a new variable diagnosis_new
Etiologynew is the most complete and clean variable and hence we will use this variable for further analysis. Categorizing this variable into GBD relevant etiologies, we can see that most common is the “Other group” and we can see the distribution of the etiologies within this group. There was no case with HTN or DM as an etiology of CKD.
Next, we clean the CKD stage variable.
# A tibble: 4 × 3
CKDstagenew n freq
<dbl> <int> <dbl>
1 5 1063 45.9
2 4 731 31.6
3 3 508 21.9
4 NA 13 0.562
For the CKD stage, the CKDstagenew variable is best one to use as it has the most complete information. There are only 13 cases with missing stage information. #### Table 1 Summary of key variables in paediatric CKD registry
Characteristic
N = 2,3151
age_new
8.0 (4.0, 11.0)
Unknown
11
age_valid
invalid
11 (0.5%)
valid
2,304 (100%)
Gender
Female
555 (24%)
Male
1,760 (76%)
state_new
Andhra Pradesh
58 (2.5%)
Assam
19 (0.8%)
Bihar
67 (2.9%)
Chhattisgarh
17 (0.7%)
Gujarat
2 (<0.1%)
Haryana
105 (4.5%)
Himachal Pradesh
1 (<0.1%)
Jammu and Kashmir
8 (0.3%)
Jharkhand
19 (0.8%)
Karnataka
232 (10%)
Kerala
51 (2.2%)
Madhya Pradesh
92 (4.0%)
Maharashtra
302 (13%)
Manipur
1 (<0.1%)
New Delhi
251 (11%)
Odisha
7 (0.3%)
Other Union Territories
12 (0.5%)
Punjab
8 (0.3%)
Rajasthan
62 (2.7%)
Tamil Nadu
259 (11%)
Telangana
3 (0.1%)
Tripura
1 (<0.1%)
Unknown
360 (16%)
Uttar Pradesh
188 (8.1%)
Uttarakhand
4 (0.2%)
West Bengal
186 (8.0%)
state_valid
invalid
360 (16%)
valid
1,955 (84%)
year_new
2008
1 (<0.1%)
2013
2 (<0.1%)
2014
2 (<0.1%)
2015
4 (0.2%)
2016
8 (0.3%)
2017
243 (10%)
2018
433 (19%)
2019
318 (14%)
2020
238 (10%)
Missing
1,066 (46%)
diagnosis_new
CAKUT
27 (1.2%)
Glomerulomephritis
333 (14%)
Other
1,502 (65%)
Polycystic Kidney Disease
124 (5.4%)
Unknown
329 (14%)
Etiologynew
AKIsequelae
9 (0.4%)
CGN
333 (14%)
Chronic pyelonephritis
36 (1.6%)
Cystic kidney disease
124 (5.4%)
Hemolytic uremic syndrome
41 (1.8%)
Hypoplasia-dysplasia
306 (13%)
Inherited tubular disease
32 (1.4%)
Nephrolithiasis
18 (0.8%)
Neurogenic bladder
131 (5.7%)
Obstructive uropathy
515 (22%)
Other CAKUT
27 (1.2%)
Reflux nephropathy
290 (13%)
Renovascular disorders
12 (0.5%)
Tubulointerstitial disease
112 (4.8%)
Undetermined
329 (14%)
CKDstagenew
3
508 (22%)
4
731 (32%)
5
1,063 (46%)
Missing
13 (0.6%)
1 Median (IQR); n (%)
In this plot of age distribution, we can see that the majority of patients are between 1 and 12 years old. This does not look like normal distribution.
Majority of cases are in the 0-5 year age group.
As seen before, majority are male children. This points to a major selection bias.
In this plot we can see that the majority of patients are from the states of Maharashtra, Tamil Nadu, New Delhi, Karnataka. Slightly >10% have no state information.
Most cases are between 2017 and 2020.
There are more cases in Stage 5 that any other stage.There are no cases in Stage 1-2!!!
The majority of cases are in the “Other” category.
In this plot we can see the distribution of the etiology of the “Other category. Most of the etiologies are related to anatomical changes of the urinary tract.
There is no variation in the distribution of CKD stages by age group
Across all age groups, the majority of cases have “Other” as the etiology following by GN and unknown. As age increases the proportion of cases with “Other” and GN as the etiology decreases and the proportion of cases with unknown etiology increases.
There is no difference in the distribution of diagnosis by year of admission.
# A tibble: 59 × 2
state n
<chr> <int>
1 Uttar Pradesh 7549
2 West Bengal 7202
3 Maharashtra 5221
4 Tamil Nadu 4832
5 Odisha 2768
6 Gujarat 2729
7 Andhra Pradesh 2622
8 Jharkhand 2019
9 Delhi 1715
10 Bihar 1709
# ℹ 49 more rows
# check the missing data in etiology_new in the main registrycombined_data %>%filter(dataset =="Main Registry", is.na(Etiology_new)) %>%count(Etiologynew)
# A tibble: 1 × 2
Etiologynew n
<chr> <int>
1 <NA> 32
Recoding of the etiologies into broader and fewer categories was done to allow for easier comparison of the two datasets. The etiologies were recoded into the following categories: CAKUT, Chronic glomerulonephritis, Diabetes, Hypertension, Tubulo-interstitial disease, and Other and unspecified. The recoding was done as follows:
Characteristic
CAKUT, N = 1,4561
Chronic glomerulonephritis, N = 9691
Diabetes, N = 2001
Hypertension, N = 1231
Tubulo-interstitial disease, N = 3581
Other and unspecified, N = 1,6191
Etiologynew
AKIsequelae
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
9 (0.6%)
Chronic Glomerulo-nephritis
0 (0%)
969 (100%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Chronic pyelonephritis
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
36 (2.2%)
Congenital disease
7 (0.5%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Cystic disease
160 (11%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Diabetic Nephropathy
0 (0%)
0 (0%)
200 (100%)
0 (0%)
0 (0%)
0 (0%)
Dysplastic kidneys
26 (1.8%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Hemolytic uremic syndrome
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
41 (2.5%)
Heredofamilial
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
4 (0.2%)
Hypertensive Nephrosclerosis
0 (0%)
0 (0%)
0 (0%)
123 (100%)
0 (0%)
0 (0%)
Hypoplasia-dysplasia
306 (21%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Inherited tubular disease
0 (0%)
0 (0%)
0 (0%)
0 (0%)
32 (8.9%)
0 (0%)
Metabolic diseases
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
11 (0.7%)
Nephrolithiasis
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
18 (1.1%)
Neurogenic bladder
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
131 (8.1%)
Obstructive uropathy
640 (44%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Other
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
901 (56%)
Other CAKUT
27 (1.9%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Reflux nephropathy
290 (20%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
Renovascular disease
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
17 (1.1%)
Tubulo interstitial disease
0 (0%)
0 (0%)
0 (0%)
0 (0%)
326 (91%)
0 (0%)
Undetermined
0 (0%)
0 (0%)
0 (0%)
0 (0%)
0 (0%)
451 (28%)
1 n (%)
Ages 0 and >15 are over-represented in the main registry.
Age group 15-18 are over-represented in main registry.
Sex distribution is the same in both registries.
UP, GJ, WB and AP are over-represented in the main registry.The top states in main registry are UP, WB, TN and MH. The top states in the paediatic registry are MH, TN, ND and KA, with a large proportion of unknown state.
The years of admission have very minimal overlap between the two registries. However, the paediatric registry has a high proportion of missing years (>20%).
Paediatric registry does not have cases in Stage 1-2. Could this be real? Are we missing something. Could the NA cases be in stage 1-2? In both datasets the stage 5 is the most common.
The most common diagnosis in main registry is Other and unspecified whereas in the paediatric registry it is CAKUT. Diabetes and hypertension as a cause are available only in the main registry.
Across most CKD stages the common etiology was Other and unspecified in the main, but CAKUT in the paediatric registry.
The proportion of CAKUT causes are declining with increasing age in the paediatric registry. The proportion of Other and unspecified causes are increasing with age in the main registry. DM and HTN are present only in 0-5 age group in the main registry.
There does not seem to be any difference in etiology distribution of CKD stages across age groups in the main registry. In the paediatric registry, the proportion of CAKUT is higher in the 0-5 age group compared to the other age groups.
The proportion of CAKUT across ages is decreasing as per the paediatric registry.
In both registries, the proportion of other and unspecified causes have increased over the years. The proportion of CAKUT has decreased in the paediatric registry. DM and HTN appear to increase over the years in the main registry.
Table of Prevalence fractions across age groups
age_new
Diabetes
Hypertension
Chronic glomerulonephritis
Other and unspecified
<1 year
6.78
0.00
27.73
65.50
1 year
7.47
0.00
25.73
66.80
2-4 years
7.84
0.00
23.96
68.19
5-9 years
9.07
0.00
17.11
73.82
10-14 years
8.94
0.00
7.61
83.45
15-19 years
10.75
0.83
3.72
84.70
20-24 years
12.02
0.78
2.76
84.44
25-29 years
13.63
0.92
2.50
82.95
Table of YLD fractions across age groups
age_new
Diabetes
Hypertension
Chronic glomerulonephritis
Other and unspecified
<1 year
1.67
0.00
18.42
79.91
1 year
1.73
0.00
18.26
80.01
2-4 years
1.84
0.00
18.63
79.53
5-9 years
2.98
0.00
20.83
76.19
10-14 years
2.68
0.00
19.95
77.37
15-19 years
1.55
13.38
18.95
66.12
20-24 years
2.20
14.91
21.42
61.47
25-29 years
3.66
16.84
23.00
56.50
Comparison of GBD prevalence fractions with registry data - overall
In this graph, we are comparing the etiology fractions of GBD prevalence number for the years 2020 grouped within age group with the two registry data sources.
We can see here that the registry data is not consistent with the GBD data. The registry data has a higher fraction of Hypertension and Diabetes compared to the GBD data in the <1 year age. Both the registry data has an increasing fraction of Chronic glomerulonephritis with age as opposed to the GBD data. The fraction of other and unspecified etiology is the opposite direction. The fraction of DM and HTN are beginning to make an impression only after age 15 in the registry data unlike GBD where it starts from the beginning (most of it is type 1 DM).
Comparison of GBD YLD fractions with registry data - overall
In this graph, we are comparing the etiology fractions of GBD YLD number for the years 2020 grouped within age group with the two registry data sources.
Interestingly, the YLD etiology fractions are similar to the registry etiology fractions unlike the prevalence fractions.
Comparison of GBD prevalence fractions with registry data - statewise
In the above graphs, the states with most cases in the registry have been plotted. There does not seem to be a consistent pattern within any one particular state. However, the states figures generally mirror the national figure.
Comparison of GBD YLD fractions with registry data - statewise
There does not seem to be a consistent pattern within any one particular state. However, the states figures generally mirror the national figure.