Introduction

An albumin test is a routine test conducted as part of health examination, that gives an indication for the overall person’s health status.
Albumin is a type of protein that is made in the liver. Its function it is to transport vitamins, hormones and drugs around the body.
It also helps to keep fluid from leaking out of blood vessels and helps in maintaining tissues.
Through blood test albumin levels can be known; low levels in an indicator of liver disease.
The albumin level recorded were in two groups; those who are liver patients and those who did not have any liver related condition.

Problem Statement

The aim of this investigation is to determine whether there is a significant difference in albumine level between two groups: patients with and without liver condition.
It is assumed that the average albumine levels are the same for both groups.
The hypothesis is tested using two independent samples t-test.
Before conducting the t-test the variances and normality of that data is checked.

Description of dataset

The dataset indian_liver_patient.csv was acquired from https://www.kaggle.com/jeevannagaraj/indian-liver-patient-dataset. The column fields and descriptions are listed below:

Age: of patients and non-patients
Gender: male, female
Protein types and level: Total_Bilirubin, Direct_Bilirubin, Alkaline_Phosphotase, Alamine_Aminotransferase, Aspartate_Aminotransferase
Total_Protiens Albumin Albumin_and_Globulin_Ratio

Unit of measure: grams per deciliter
Dataset: 1 = patient, 2= non-patient

Descriptive statistics and boxplot

The following changes were made to the dataset before it was further processed:

The column dataset was renamed liver patient and levels 1 changed to yes and 2 to no.

indian_liver_patient <- read_csv("C:/Users/Jay Nugasur/Desktop/Datasets/indian-liver-patient-records/indian_liver_patient.csv")

names(indian_liver_patient)[names(indian_liver_patient) == "Dataset"] <- "Liver_patient" 

indian_liver_patient$Liver_patient<-indian_liver_patient$Liver_patient %>% factor(levels=c(1,2), labels=c("Yes","No"),ordered= TRUE)

Below is the summary of albumin level for the two groups:liver patients and non-patients

indian_liver_patient %>% group_by(Liver_patient) %>% summarise(Min = min(Albumin,na.rm = TRUE),
                                                    Q1 = quantile(Albumin,probs = .25,na.rm = TRUE),
                                                    Median = median(Albumin, na.rm = TRUE),
                                                    Q3 = quantile(Albumin,probs = .75,na.rm = TRUE),
                                                    Max = max(Albumin,na.rm = TRUE),
                                                    Mean = mean(Albumin, na.rm = TRUE),
                                                    SD = sd(Albumin, na.rm = TRUE),
                                                    n = n(),
                                                    Missing = sum(is.na(Albumin))) ->table1
knitr::kable(table1)

Liver_patient	Min	Q1	Median	Q3	Max	Mean	SD	n	Missing
Yes	0.9	2.5	3.0	3.625	5.5	3.060577	0.7865947	416	0
No	1.4	2.9	3.4	4.000	5.0	3.344311	0.7836895	167	0

Descriptive statistics and boxplot Cont.

Boxplot of the two groups

boxplot(Albumin~Liver_patient, data = indian_liver_patient, ylab = "Albumin Level", xlab= "Liver patient")

The difference in mean values can be observed in the two groups.

Hypothesis Testing: Normality and variance homogeneity

Prior conducting the t-test of the two indepednent groups, the normality and variance of the two groups is checked.

Patient <- indian_liver_patient %>% filter(Liver_patient == "Yes")
Patient$Albumin %>% qqPlot(dist="norm")

Non_patient <- indian_liver_patient %>% filter(Liver_patient == "No")
Non_patient$Albumin %>% qqPlot(dist="norm")

The sampling distribution is follows the trend of normal distribution for both of these groups.

Hypothesis Testing: Normality and variance homogeneity Cont.

Levene’s test for homogeneity

leveneTest(Albumin ~ Liver_patient, data = indian_liver_patient)->table2

knitr::kable(table2)

	Df	F value	Pr(>F)
group	1	0.235723	0.6274954
	581	NA	NA

From the above test the p value obtained is greater than 0.05, therefore it is assumed variances are equal.

Hypthesis Testing

A two samplet-test used to compare the mean of the two independent groups.
The null hypothesis assumes that mean value between the two groups are equal.
The alternative hypothesis assumes that there is a significant difference in the mean of the two groups.

\[H_0: \mu_1 = \mu_2 \]

\[H_A: \mu_1 \ne \mu_2\]

Hypthesis Testing Cont.

result<- t.test(Albumin ~ Liver_patient, data= indian_liver_patient, var.equal = TRUE,alternative = "two.sided")
result

## 
##  Two Sample t-test
## 
## data:  Albumin by Liver_patient
## t = -3.9418, df = 581, p-value = 9.074e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4251106 -0.1423583
## sample estimates:
## mean in group Yes  mean in group No 
##          3.060577          3.344311

result$p.value->table3

knitr::kable(table3)

9.07e-05

result$conf.int->table4

knitr::kable(table4)

-0.4251106

-0.1423583

Result and discussion

T-test result:

Based on the t-test result is clear that the null hypothesis Ho should be rejected as the value of p<0.05.
The 95% confidence interval [-0.4251106, -0.1423583] also did not capture Ho.
In conclusion, the results of the two-sample t-test were statistically significant.

Major findings:

The dissimilarity in the protein level albumin in patients with a liver condition and those without is very obvious. So, an albumine level in blood scale can reliably be used to assess the health of a person’s kidneys. For this investigation that dataset was large enough, but as an improvement, and for further investigation perhaps a set of albumine level data from another countries can be put under test. The current dataset is for patients from India, perhaps findings will differ for people with a different diet.

Albumin Level T-test

For Liver patients and non-patients

RPubs link information