Abdul-Karim Kadiri - Student ID: 223009555
2023-05-19
Recently, the health landscape in Ghana has been swamped with a disturbing trend: an increase in incidences of kidney failure among the country’s youth. My colleague, a bio-statistician with Ghana Health Service (GHA), and I have taken particular interest in this issue. We looked at the variables that could be accounting for this prevalence, particularly we focused on age, exercise habits, and alcohol intake among patients presenting these conditions at various hospitals across the country.
Kidney failure, medically referred to as End-Stage Kidney Disease (ESKD) or End-Stage Renal Disease (ESRD), is a critical condition that manifests when the kidneys lose their filtering capacity (America Kidney Fund Medical Advisory Committee, 2022). This inability to adequately eliminate waste from the body has severe symptoms, including but not limited to, itchy skin or rashes, muscle cramps, nausea, swelling in extremities, changes in urination patterns, frothy urine, breathlessness, and disrupted sleep (America Kidney Fund Medical Advisory Committee, 2022).
This investigation stems from the concerning visual evidence that often emerges on social media platforms and news outlets. Images of young adults between the ages of 25 to 45, incapacitated by kidney diseases, are far too common. The recurring appeals for funds to cover expenses like dialysis treatments further underscore the severity of this health crisis.
Over the past one year, the GHA has been collecting data on the variables that might be contributing to kidney failure among youth in Ghana. Particularly, sex, age, exercise habits, alcohol intake, and kidney failure diagnoses among the youth presenting at various district and regional government hospitals across the country.
The data used in this model was collected by the GHA over a period of one year and some few months. The data was collected from some district and regional hospitals across the country (government hospitals only). The data contains variables such as sex, age, exercise, alcohol_intake, kidney_failure.
Exercise - This variable relates to the proportion of youth who reported to the hospital with symptoms of kidney failures and indicated that they exercise regularly.
Alcohol_intake - This variable refers to the proportion of the youth who reported to the hospitals with symptoms of kidney failures and indicated that they consume alcohol regularly
Kidney_failure - This is the kidney failure incidences. Of the youth who reported to the hospitals, the proportion were diagnosed with kidney failure or problems.
#
#we use test the relationship between the independent variables to be sure that they are not highly correlated
cor.test(kidney.data$exercise, kidney.data$alcohol_intake)
##
## Pearson's product-moment correlation
##
## data: kidney.data$exercise and kidney.data$alcohol_intake
## t = 0.33714, df = 496, p-value = 0.7362
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.07282732 0.10286603
## sample estimates:
## cor
## 0.01513618
The correlation between proportion of youths who exercise and those who take alcohol is small (0.015 is only a 1.5% correlation), so we can include both parameters in our model.
#
#we use test the relationship between the independent variables to be sure that they are not highly correlated
cor.test(kidney.data$exercise, kidney.data$age)
##
## Pearson's product-moment correlation
##
## data: kidney.data$exercise and kidney.data$age
## t = 0.93097, df = 496, p-value = 0.3523
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.04627127 0.12915786
## sample estimates:
## cor
## 0.04176519
The correlation between proportion of youths who exercise and age is also small (0.04), so we can include both parameters in our model.
There are different types of models, linear regression model, exponential model, logarithmic model, polynomial model, To determine which model to use, we plot the data, specifically, the dependent variables and the independent variables.
plot(kidney_failure ~ exercise, data = kidney.data,
main="Kidney Failure vs. Exercise",
xlab="Exercise",
ylab="Kidney Failure")
##
## Pearson's product-moment correlation
##
## data: kidney.data$kidney_failure and kidney.data$exercise
## t = -25.435, df = 496, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7881167 -0.7115190
## sample estimates:
## cor
## -0.7523496
From the plot, we see that the relationship between kidney failure incidence and exercise is linear, with a correlation coefficient of -0.752. This means that the higher the exercise, the lower the kidney problem.
plot(kidney_failure ~ alcohol_intake, data = kidney.data,
main="Kidney Failure vs. Alcohol Intake",
xlab="Alcohol Intake",
ylab="Kidney Failure")
##
## Pearson's product-moment correlation
##
## data: kidney.data$exercise and kidney.data$alcohol_intake
## t = 0.33714, df = 496, p-value = 0.7362
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.07282732 0.10286603
## sample estimates:
## cor
## 0.01513618
Although the relationship between those who take alcohol and kidney failure is a bit less clear, with correlation coeffieicent of 0.015 it still appears to be linear but weak.
Given that the data appears to be linear, I proceed to conduct a regression model. However, I will also use logarithmic model and then check the differences with AIC.
set.seed(123)
# As indicated last week, at least 75 percent of the data can be used for training the model
train_size <- floor(0.75 * nrow(kidney.data))
# Generating the sample for training
train_indices <- sample(seq_len(nrow(kidney.data)), size = train_size)
# Creating the training and test sets
train_data <- kidney.data[train_indices, ]
test_data <- kidney.data[-train_indices, ]
##
## Call:
## lm(formula = kidney_failure ~ exercise + alcohol_intake + age +
## sex, data = train_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.3934 -0.9507 0.5415 1.4587 4.4154
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.292019 0.760749 14.843 < 2e-16 ***
## exercise -0.122956 0.006888 -17.851 < 2e-16 ***
## alcohol_intake 0.075259 0.014851 5.068 6.39e-07 ***
## age 0.028917 0.019090 1.515 0.131
## sexMale 0.389293 0.302616 1.286 0.199
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.303 on 368 degrees of freedom
## Multiple R-squared: 0.5869, Adjusted R-squared: 0.5824
## F-statistic: 130.7 on 4 and 368 DF, p-value: < 2.2e-16
The estimated effect or coefficient of exercise is -0.122, while the estimated coefficient of alcohol intake is 0.07.That of age is 0.002
This means that for every 1% increase in exercise, there is a correlated 0.122 percent decrease in the incidence of kidney failure among the youth. Meanwhile, for every 1% increase in alcohol consumption, there is a 0.07% increase in kidney failures among the youth.
The coefficient of age is 0.02. it means that an increase in the year of young adults leads to an increase in the the likelihood of the person’s kidney failing by 0.02%.
The R-squared value of 0.5869 suggests that about 58.69% of the variability in kidney failure is explained by these four variables in this model.
As seen from the p-value, variables “exercise” and “alcohol_intake” are statistically significantly associated with kidney failure at a 5% level of significance. “Age and”sexMale” variable is not statistically significant at a 5% level since its p-value is greater than 0.05.
##
## Call:
## lm(formula = log(kidney_failure) ~ exercise + alcohol_intake +
## age + sex, data = train_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.72325 -0.11313 0.08284 0.23855 0.43179
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.620660 0.113527 23.084 < 2e-16 ***
## exercise -0.020157 0.001028 -19.610 < 2e-16 ***
## alcohol_intake 0.016686 0.002216 7.529 3.98e-13 ***
## age 0.002695 0.002849 0.946 0.345
## sexMale -0.183891 0.045159 -4.072 5.71e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3436 on 368 degrees of freedom
## Multiple R-squared: 0.5674, Adjusted R-squared: 0.5627
## F-statistic: 120.7 on 4 and 368 DF, p-value: < 2.2e-16
## [1] "Mean Squared Error for the multiple regression is: 5.4881304361219"
## [1] "Mean Squared Error for the logarithmic model is: 54.5914662756264"
The multiple regression analysis of the data set suggests significant implications concerning kidney failure among Ghanaian youths.
The negative relationship between exercise and kidney failure highlights the potential protective effect of physical activity on kidney health.
It suggests that as the percentage of regular exercise increases by 1%, kidney failure incidents decrease by 0.13%. This finding underscores the importance of promoting regular exercise and physical activity among young people to help protect against kidney disease.
Conversely, the analysis shows a positive correlation between alcohol intake and kidney failure rates. Specifically, a 1% increase in alcohol consumption corresponds to a 0.08% increase in kidney failure among the youth.
This indicates that alcohol consumption may have detrimental effects on kidney health, thereby increasing the risk of kidney failure. Public health efforts may be required to curb alcohol abuse among youths and raise awareness about the potential long-term health risks.
The low standard errors and high t-statistics for these coefficients strongly suggest that the observed effects are not due to chance.
Thank you.