Interference for Categorical Data

Question 1

The question of atheism was asked by WIN-Gallup International in a similar survey that was conducted in 2005. (We assume here that sample sizes have remained the same.) Table 4 on page 13 of the report summarizes survey results from 2005 and 2012 for 39 countries.

1.Answer the following two questions using the inference function. As always, write out the hypotheses for any tests you conduct and outline the status of the conditions for inference.

Is there convincing evidence that Spain has seen a change in its atheism index between 2005 and 2012? Hint: Create a new data set for respondents from Spain. Form confidence intervals for the true proportion of athiests in both years, and determine whether they overlap.

download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")


spain2005 <- subset(atheism, nationality == "Spain" & year == "2005") 
spain2005$nationality <- as.factor(as.character(spain2005$nationality)) 
table(spain2005$nationality, spain2005$response)

##        
##         atheist non-atheist
##   Spain     115        1031

spain2012 <- subset(atheism, nationality == "Spain" & year == "2012")
spain2012$nationality <- as.factor(as.character(spain2012$nationality))
table(spain2012$nationality, spain2012$response)

##        
##         atheist non-atheist
##   Spain     103        1042

inference(spain2005$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.1003 ;  n = 1146 
## Check conditions: number of successes = 115 ; number of failures = 1031 
## Standard error = 0.0089 
## 95 % Confidence interval = ( 0.083 , 0.1177 )

# p_hat = 0.1003 ;  n = 1146 
# Check conditions: number of successes = 115 ; number of failures = 1031 
# Standard error = 0.0089 
# 95 % Confidence interval = ( 0.083 , 0.1177 )

inference(spain2012$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.09 ;  n = 1145 
## Check conditions: number of successes = 103 ; number of failures = 1042 
## Standard error = 0.0085 
## 95 % Confidence interval = ( 0.0734 , 0.1065 )

# p_hat = 0.09 ;  n = 1145 
# Check conditions: number of successes = 103 ; number of failures = 1042 
# Standard error = 0.0085 
# 95 % Confidence interval = ( 0.0734 , 0.1065 )

Assuming observations to be independent. The number of atheists in 2005 is 115 and in 2012 it is 103. Both are greater than 10, so we can assume near normal distribution. H0: The number of atheists in Spain did not change between 2005 and 2012, or p12=p05=0.1. HA: The number of atheists in Spain changed between 2005 and 2012, or p12≠0.1. There is significant overlap between confidence interval for 2005 sample and 2012 sample. Additionally, p2012=0.09 and it is within the confidence interval for 2005 - (0.081,0.1177) - so we fail to reject the null hypothesis. The change in atheism is likely due to chance.

Is there convincing evidence that the United States has seen a change in its atheism index between 2005 and 2012?

us2005 <- subset(atheism, nationality == "United States" & year == "2005")
us2005$nationality <- as.factor(as.character(us2005$nationality))
table(us2005$nationality, us2005$response)

##                
##                 atheist non-atheist
##   United States      10         992

us2012 <- subset(atheism, nationality == "United States" & year == "2012")
us2012$nationality <- as.factor(as.character(us2012$nationality))
table(us2012$nationality, us2012$response)

##                
##                 atheist non-atheist
##   United States      50         952

inference(us2005$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.01 ;  n = 1002 
## Check conditions: number of successes = 10 ; number of failures = 992 
## Standard error = 0.0031 
## 95 % Confidence interval = ( 0.0038 , 0.0161 )

# p_hat = 0.01 ;  n = 1002 
# Check conditions: number of successes = 10 ; number of failures = 992 
# Standard error = 0.0031 
# 95 % Confidence interval = ( 0.0038 , 0.0161 )

inference(us2012$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

# p_hat = 0.0499 ;  n = 1002 
# Check conditions: number of successes = 50 ; number of failures = 952 
# Standard error = 0.0069 
# 95 % Confidence interval = ( 0.0364 , 0.0634 )

Assuming observations to be independent. The number of atheists in 2005 is 10 and in 2012 it is 50. The number of atheists in 2005 is borderline enough to assume near normal distribution. H0: The number of atheists in the United States did not change between 2005 and 2012, or p12=p05=0.01.

HA: The number of atheists in the United States changed between 2005 and 2012, or p12≠0.01. There is no overlap between confidence interval for 2005 sample and 2012 sample. Additionally, p12=0.05 and it is outside of the confidence interval for 2005 - (0.0038,0.0161) - so we reject the null hypothesis. The change in atheism is not likely due to chance.

Question 2

If in fact there has been no change in the atheism index in the countries listed in Table 4, in how many of those countries would you expect to detect a change (at a significance level of 0.05) simply by chance? Hint: Look in the textbook index under Type 1 error.

If the atheism index remains unchanged, yet we observe a change by chance and incorrectly reject a true null hypothesis, this signifies a Type I error. Given a significance level of 0.05 and a sample of 39 countries, we anticipate committing a Type I error in approximately 2 countries, calculated as 39 * 0.05 = 1.95, rounded up to 2.

Question 3

Suppose you’re hired by the local government to estimate the proportion of residents that attend a religious service on a weekly basis. According to the guidelines, the estimate must have a margin of error no greater than 1% with 95% confidence. You have no idea what to expect for p. How many people would you have to sample to ensure that you are within the guidelines? Hint: Refer to your plot of the relationship between p and margin of error. Do not use the data set to answer this question.

If the margin of error is 0.01, then at a 95% confidence level, the standard error (SE) equals 0.011.96, which amounts to 0.0051. Since the value of p is unknown, a worst-case assumption is made with p=0.5. Utilizing the formula SE = p(1-p)n, where n = p(1-p)SE2, the calculation results in n = 0.5 * 0.5 / 0.00512, yielding 9604. Thus, the sample size must be at least 9,604 individuals, representing a worst-case scenario. However, it’s plausible that there exists a more accurate estimate of the proportion of residents attending services, allowing for a reduction in the value of p in the aforementioned calculation.

Interference for Categorical Data

Nabeel Mohammediliyas Sheikh

2024-03-09

Question 1

Question 2

Question 3