MATH 1324 ASSIGNMENT 3

Analysis of Liver condition of India Based on Protein Levels

Shivaniben Nikunjkumar Prajapati (s3738826) and Santosh Kumaravel Sundaravadivelu (s3729461)

Last updated: 29 October, 2018

Introduction

Problem Statement

We are investigating Total Proteins consumed by Male and Female and how it helps the function of liver and other organs in the long run.In this problem we are trying to identify whether there is some association between two variable for e.g. Total_Protein, Gender since, the level of protein is different in both male and female and this test will help us understand that whether protein is associated with gender or not.

Data

core<- read_csv("D:/Intro to Stats/Assignment3/indian_liver_patient.csv")

As mentioned in the introduction the dataset is taken from www.kaggle.com. The dataset contains following variables:

Data Cont.

Data Cont.

Two variables are very important for further analysis

Numeric Variables : Scale of Numeric variable

Data Cont.

Descriptive Statistics and Visualisation

boxplot(core$Total_Protiens)

Descriptive Statistics and Visualisation Cont.

cap <- function(x){
    quantiles <- quantile( x, c(.05, 0.25, 0.75, .95 ) )
    x[ x < quantiles[2] - 1.5*IQR(x) ] <- quantiles[1]
    x[ x > quantiles[3] + 1.5*IQR(x) ] <- quantiles[4]
    x
}

core$Total_Protiens <- core$Total_Protiens %>% cap()
boxplot(core$Total_Protiens)

Decsriptive Statistics Cont.

core %>%group_by(Gender) %>% summarise(Min = min(Total_Protiens, na.rm = TRUE),
 Q1 = quantile(Total_Protiens,probs = .25, na.rm = TRUE),
Median = median(Total_Protiens, na.rm = TRUE),
Q3 = quantile(Total_Protiens,probs = .75,na.rm = TRUE),
Max = max(Total_Protiens, na.rm = TRUE),
Mean = mean(Total_Protiens, na.rm = TRUE),
SD = sd(Total_Protiens, na.rm = TRUE),
n = n(),
Missing = sum(is.na(Total_Protiens))) -> table1
knitr::kable(table1)
Gender Min Q1 Median Q3 Max Mean SD n Missing
Female 4.1 5.925 6.8 7.5 9.2 6.660634 1.118123 142 0
Male 3.7 5.700 6.5 7.1 9.2 6.438435 1.007471 441 0

Hypothesis Testing

Hypothesis Testing Cont.

Since P value (0.09503 < 0.05) it is consired as the unequal variance for two-sample T-test. * When performing Two-sample T-test the P value is 0.0188 * For Ho - Both male and Female have protein in equal amounts. * For HA - With the mean difference Female have more than Male

Since pValue is less than 0.05, we can reject Ho.

total_protein_male <- core %>% filter(Gender == "Male")
total_protein_male$Total_Protiens %>% qqPlot(dist="norm")

## [1] 391  77

Hypothesis Testing Cont.

total_protein_female <- core %>% filter(Gender == "Female")
total_protein_female$Total_Protiens %>% qqPlot(dist="norm")

## [1] 111 129

Hypothesis Testing Cont.

total_protein_male$Total_Protiens <- total_protein_male$Total_Protiens %>% cap()

total_protein_female$Total_Protiens <- total_protein_female$Total_Protiens %>% cap()

core_new <- rbind(total_protein_male,total_protein_female)

core_new %>% boxplot(Total_Protiens ~ Gender, data = ., ylab = "Protein Value", xlab = "Gender")

Hypothesis Testing Cont.

leveneTest(Total_Protiens ~ Gender, data = core_new)
t.test(
  Total_Protiens ~ Gender,
  data = core_new,
  var.equal = FALSE,
  alternative = "two.sided"
  )
## 
##  Welch Two Sample t-test
## 
## data:  Total_Protiens by Gender
## t = 2.1085, df = 219.55, p-value = 0.03612
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.01450457 0.42989229
## sample estimates:
## mean in group Female   mean in group Male 
##             6.660634             6.438435

Hypthesis Testing Cont.

$t= mean(x_1)???mean(x_2)/ $

\(s^2p=(n_1???1)s^2_1+(n2???1)s^2_2/n_1+n_2???2\)

\(df'=(s^ 2_1/n_1+s^2_2/n_2)^2/(s^2_1/n_1)^2/n_1???1+(s^2_2/n_2)^2/n_2???1\)

The t-statistic is compared to a two-tailed t-critical value t??? with df:

\(df=n_1+n_2???2\)

The 95% CI of the difference between the means was calculated using the following formula in R:

\(mean(x_1)???mean(x_2)???t_(n_1+n_2???2),1?????/2\sqrt{s2pn1+s2pn2}, mean(x_1)???mean(x_2)+t_(n_1+n_2???2),1?????/2\sqrt{s2pn1+s2pn2}\)

Discussion

Strength: * The major strength of this finding is that the chance of females being strong compared to Male and it breaks the myth which was being belived. * It can play a vital role in the mindset of the people and the approach towards the Total protein consumed will be drastically changed.

Limitation: * The only limitation of this study was that only a small group of people and their liver condition is examined. We could get more interesting results while performing this test on different variables such as age_group, gender and different chemical levels in their liver.

Conclusion

References