Exercise 1

In the first paragraph, several key findings are reported. Do these percentages appear to be sample statistics (derived from the data sample) or population parameters?

Since it is describing percentages of a population, it will be population parameters.

Exercise 2

The title of the report is “Global Index of Religiosity and Atheism”. To generalize the report’s findings to the global human population, what must we assume about the sampling method? Does that seem like a reasonable assumption?

We must assume that the sample data is random and the sample sets are independent. Since the sample sizes are less than 10% of the population we can assume that the sample method is reasonable.

Exercise 3

What does each row of Table 6 correspond to? What does each row of atheism correspond to?

load("more/atheism.RData")

Each row is a sample size for a different country. Each row of atheism is a person that was polled or a case.

Exercise 4

Using the command below, create a new dataframe called us12 that contains only the rows in atheism associated with respondents to the 2012 survey from the United States. Next, calculate the proportion of atheist responses. Does it agree with the percentage in Table 6? If not, why?

The proportion of atheists in the US is 4.99%. This agrees with the percent in Table 6

us12 <- subset(atheism, nationality == "United States" & year == "2012")

atheist <- filter(us12, response == "atheist")

prop <- nrow(atheist)/nrow(us12)
prop*100
## [1] 4.99002

Exercise 5

Write out the conditions for inference to construct a 95% confidence interval for the proportion of atheists in the United States in 2012. Are you confident all conditions are met?

Conditions:

  • Sample observations are independent
  • We see at least 10 failures and 10 successes in our sample
inference(us12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

Exercise 6

Based on the R output, what is the margin of error for the estimate of the proportion of the proportion of atheists in US in 2012?

MofE = SE * \(z\)

0.0069 * 1.96
## [1] 0.013524

Exercise 7

Using the inference function, calculate confidence intervals for the proportion of atheists in 2012 in two other countries of your choice, and report the associated margins of error. Be sure to note whether the conditions for inference are met. It may be helpful to create new data sets for each of the two countries first, and then use these data sets in the inference function to construct the confidence intervals.

Netherlands

Conditions:

  • Sample observations are independent
  • We see at least 10 failures and 10 successes in our sample
neth12 <- subset(atheism, nationality == "Netherlands" & year == "2012")

inference(neth12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.1395 ;  n = 509 
## Check conditions: number of successes = 71 ; number of failures = 438 
## Standard error = 0.0154 
## 95 % Confidence interval = ( 0.1094 , 0.1696 )

Ireland

Conditions:

  • Sample observations are independent
  • We see at least 10 failures and 10 successes in our sample
ire12 <- subset(atheism, nationality == "Ireland" & year == "2012")

inference(ire12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.099 ;  n = 1010 
## Check conditions: number of successes = 100 ; number of failures = 910 
## Standard error = 0.0094 
## 95 % Confidence interval = ( 0.0806 , 0.1174 )

Exercise 8

Describe the relationship between p and me.

The smaller the proportion, the smaller the margin of error is. The margin of error is greatest when the proportion is 50/50.

Exercise 9

Describe the sampling distribution of sample proportions at n=1040 and p=0.1. Be sure to note the center, spread, and shape.

The sampling distribution is a normal distribution with a narrow spread and unimodal shape. The mean proportion of the samples is 9.99% with a min of 6.9% and max of 1.35%. The interquartile range is between 9.3% and 10.6%.

p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))

summary(p_hats)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981

Exercise 10

Repeat the above simulation three more times but with modified sample sizes and proportions: for n=400 and p=0.1, n=1040 and p=0.02, and n=400 and p=0.02. Plot all four histograms together by running the par(mfrow = c(2, 2)) command before creating the histograms. You may need to expand the plot window to accommodate the larger two-by-two plot. Describe the three new sampling distributions. Based on these limited plots, how does nn appear to affect the distribution of \(\hat p\)? How does \(\hat p\) affect the sampling distribution?

Plot p=.1 and n=400: Plot has a normal distribution with a unimodal shape and the mean around .1. It has a slightly wider spread. The frequency around the mean proportion is greater.

Plot p=.2 and n=1040: Plot has a normal distribution with a unimodal shape and the mean around .2. The spread is narrower between .15 and .25.

Plot p=.2 and n=400: Plot has a normal distribution with a unimodal shape and the mean around .2. The spread is wider between .13 and .27.

The larger the \(n\) the closer the spread is around the proportion. The higher the proportion the less frequent the sample proportion mean will close to the actual proportion. The distribution will not be as narrow as a distribution with a lower proportion.

p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)

p1 <- 0.1
n1 <- 400
p_hats1 <- rep(0, 5000)

p2 <- 0.2
n2 <- 1040
p_hats2 <- rep(0, 5000)

p3 <- 0.2
n3 <- 400
p_hats3 <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n1, replace = TRUE, prob = c(p1, 1-p1))
  p_hats1[i] <- sum(samp == "atheist")/n1
}

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n2, replace = TRUE, prob = c(p2, 1-p2))
  p_hats2[i] <- sum(samp == "atheist")/n2
}

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n3, replace = TRUE, prob = c(p3, 1-p3))
  p_hats3[i] <- sum(samp == "atheist")/n3
}

par(mfrow = c(2,2))
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
hist(p_hats1, main = "p = 0.1, n = 400", xlim = c(0, 0.21))
hist(p_hats2, main = "p = 0.2, n = 1040", xlim = c(0.1, 0.25))
hist(p_hats3, main = "p = 0.2, n = 400", xlim = c(0.1, 0.3))

Exercise 11

If you refer to Table 6, you’ll find that Australia has a sample proportion of 0.1 on a sample size of 1040, and that Ecuador has a sample proportion of 0.02 on 400 subjects. Let’s suppose for this exercise that these point estimates are actually the truth. Then given the shape of their respective sampling distributions, do you think it is sensible to proceed with inference and report margin of errors, as the reports does?

The \(ME\) for Ecuador is 0.0392.

The \(ME\) for Australia is 0.0182.

Even though Ecuador’s \(ME\) is higher, for the sample sizes both are still within 4% of the true proportion with 95% confidence. I think you can still proceed with inference and \(ME\) reports.

On Your Own

1A. Is there convincing evidence that Spain has seen a change in its atheism index between 2005 and 2012?

\(H_0:\) no change in mean proportion from 2005 to 2012

\(H_a:\) change in mean proportion from 2005 to 2012

Conditions: - Samples are independent - There are at least 10 failures and 10 successes.

There is not convincing evidence that Spain has seen a change in atheism from 2005 to 2012. The mean proportion in 2005 falls withing the 95% confidence interval for 2012.

spain05 <- subset(atheism, nationality == "Spain" & year == "2005")
inference(spain05$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.1003 ;  n = 1146 
## Check conditions: number of successes = 115 ; number of failures = 1031 
## Standard error = 0.0089 
## 95 % Confidence interval = ( 0.083 , 0.1177 )
spain12 <- subset(atheism, nationality == "Spain" & year == "2012")
inference(spain12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.09 ;  n = 1145 
## Check conditions: number of successes = 103 ; number of failures = 1042 
## Standard error = 0.0085 
## 95 % Confidence interval = ( 0.0734 , 0.1065 )

1B: Is there convincing evidence that the United States has seen a change in its atheism index between 2005 and 2012?

\(H_0:\) no change in mean proportion from 2005 to 2012

\(H_a:\) change in mean proportion from 2005 to 2012

Conditions: - Samples are independent - There are at least 10 failures and 10 successes.

The United States has convincing evidence of a change in atheism between 2005 and 2012. The confidence intervals for both years do not overlap. The mean proportion increased from .01 to .05 from 2005 to 2012.

usa05 <- subset(atheism, nationality == "United States" & year == "2005")
inference(usa05$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.01 ;  n = 1002 
## Check conditions: number of successes = 10 ; number of failures = 992 
## Standard error = 0.0031 
## 95 % Confidence interval = ( 0.0038 , 0.0161 )
usa12 <- subset(atheism, nationality == "United States" & year == "2012")
inference(usa12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
  1. If in fact there has been no change in the atheism index in the countries listed in Table 4, in how many of those countries would you expect to detect a change (at a significance level of 0.05) simply by chance?

I would expect countries that had a 3% change or less in atheism to be because of chance. This would mean it that it would be a big enough change to reject the null hypothesis but close enough that it could result in a Type 1 error. Any change more than that would be due to other circumstances other than chance.

  1. Suppose you’re hired by the local government to estimate the proportion of residents that attend a religious service on a weekly basis. According to the guidelines, the estimate must have a margin of error no greater than 1% with 95% confidence. You have no idea what to expect for pp. How many people would you have to sample to ensure that you are within the guidelines?

\(1.96 \times \sqrt{\frac{p(1-p)}{n}} \lt 0.01\)

\(384.16 \times p(1-p) \lt n\)

Since the margin of error is greatest at a proportion of .5, to achieve the ME of <0.01 we would need at least 97 residents.