# EXAMPLE- Chi Square-Test for the Difference Between Two Proportions
# By, Eralda Gjika (Dhamo)
# Linkedin: https://al.linkedin.com/in/eralda-dhamo-gjika-71879128
# Department of Applied Mathematics, Faculty of Natural Science, University of Tirana, Albania
#
# For this work I will use the Portuguese Bank Marketing Data.
# and some examples
#
#######################################################################################
# Chi-Square Test for the Difference Between Two Proportions
#######################################################################################
# if -1.96<z<1.96, then the difference is not significant at 5%, Accept H0: p1=p2
# if z<-1.96 and z???1.96, then the difference is significant at 5%, Reject H0:
# The significance level (p-value) corresponding to the z-statistic can be read in the z-table.
# Attention! n*p>5 and n(1-p)>5 are the conditions to use the Z - statistics
#
# binom.test(): compute exact binomial test. Recommended when sample size is small
# prop.test(): can be used when sample size is large ( N > 30). It uses a normal approximation to binomial
#
# binom.test(x, n, p = 0.5, alternative = "two.sided")
# prop.test(x, n, p = NULL, alternative = "two.sided",correct = TRUE)
# x: the number of of successes
# n: the total number of trials
# p: the probability to test against.
# correct: a logical indicating whether Yates continuity correction should be applied where possible.
# Yates continuity correction is really important if either the expected successes or failures is < 5
#
#
# Example:
# A bank has a dataset where 60% of the clients are married. We obtain a sample of n=5000 individuals from which, 3800 have a house loan (1200 do have another type of loan)
# We want to learn if the proportion is correct.
Z.test.1<- prop.test(x = 3800, n = 5000, p = 0.6, correct = FALSE) # n*p=5000*0.6>5
# Print the results of the test
Z.test.1
##
## 1-sample proportions test without continuity correction
##
## data: 3800 out of 5000, null probability 0.6
## X-squared = 533.33, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
## 0.7479653 0.7716355
## sample estimates:
## p
## 0.76
# Output: sample estimates: p =0.76;p-value < 2.2e-16 ,
# H0: p=0.6
# Ha (alternative hypothesis): true p is not equal to 0.6
# Decision: The proportion of individuals married with a house loan is not equal to 0.6
#
# If we want to test Ha: p>0.6
Z.test.2<-prop.test(x = 3800, n = 5000, p = 0.6, correct = FALSE, alternative = "greater")
Z.test.2
##
## 1-sample proportions test without continuity correction
##
## data: 3800 out of 5000, null probability 0.6
## X-squared = 533.33, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is greater than 0.6
## 95 percent confidence interval:
## 0.7499264 1.0000000
## sample estimates:
## p
## 0.76
# Decision:p-value < 2.2e-16, Accept Ha: p> 0.6
Z.test.3<-prop.test(x = 3800, n = 5000, p = 0.6, correct = FALSE, alternative = "less")
Z.test.3
##
## 1-sample proportions test without continuity correction
##
## data: 3800 out of 5000, null probability 0.6
## X-squared = 533.33, df = 1, p-value = 1
## alternative hypothesis: true p is less than 0.6
## 95 percent confidence interval:
## 0.0000000 0.7697924
## sample estimates:
## p
## 0.76
# Decision: p-value=1 >0.05, Accept H0, refuse Ha: p<0.6 with confidence level 95%
#
# Example:
# The number of clients entering at a bank service center during one hour are clasified, 70% who use the serving desk.
# The manager of the branch measures the number clients during one hour and noticed that: in 27 clients, 16 asked desk service.
# Does the manager have the right to believe that there is less that 70% of clients who use desk service?
# We have less than n=30 observations so, we use Binomial test
B.test.4<-binom.test(x =16 , n = 27, p = 0.7)
B.test.4
##
## Exact binomial test
##
## data: 16 and 27
## number of successes = 16, number of trials = 27, p-value = 0.2924
## alternative hypothesis: true probability of success is not equal to 0.7
## 95 percent confidence interval:
## 0.3879839 0.7761027
## sample estimates:
## probability of success
## 0.5925926
# Output: p-value = 0.2924 >0.05 , Accept Ho: p=0.7, there is not significance difference among proportions
#
#######################################################################################
# Chi-square Goodness of Fit Test in R
#######################################################################################
# The chi-square goodness of fit test is used to compare the observed distribution
# to an expected distribution, in a situation where we have two or more categories in a discrete data
# Null hypothesis (H0): There is NO significant difference between the observed and the expected value.
# Alternative hypothesis (Ha): There is a significant difference between the observed and the expected value.
#
# Example:
# In a dataset of a bank there is a clasiffication of individuals who have a loan based on their civil status.
# (single, married, divorced). The bank wants to fix a proportion of loans for its costumer:1:3:1
# The bank believes that married individuals have a greater chance of paying the loan on time.
# On a sample of 155 individuals from the dataset it results the proportion (43- single, 80-married, 32-divorced)
# Is there a significance difference between observed and expected?
civil.status<-c(43,80,32)# civil status from the sample
probab<-c(1/5,3/5,1/5)# ratio from the bank 1:3:1
test.fit.1<-chisq.test(civil.status, p = probab)
test.fit.1
##
## Chi-squared test for given probabilities
##
## data: civil.status
## X-squared = 6.4946, df = 2, p-value = 0.03888
# p-value = 0.03888 <0.05 Accept Ha: there is a significance difference between observation and expected.
#
# Note: chi-square test must be used when all "expected values" are greater than 5
#If we want to see the "expected values" for the example above
test.fit.1$expected
## [1] 31 93 31
# Let's try to change the ratio of teh bank 1:2:1
probab.2<-c(1/4,2/4,1/4)# ratio 1:2:1
test.fit.2<-chisq.test(civil.status, p = probab.2)
test.fit.2
##
## Chi-squared test for given probabilities
##
## data: civil.status
## X-squared = 1.7226, df = 2, p-value = 0.4226
# Decision: p-value = 0.4226 > 0.05 Accept Ho: there is no significance difference between observation and expected
#
# Thank you for reading , using and sharing!
# E. Gjika