Lab_8

1.) In the first paragraph, several key findings are reported. Do these percentages appear to be sample statistics (derived from the data sample) or population parameters?

13% of atheists, 23% are non-religion, and 59% believe in a religion, these are percentages that stem from sample statics, that came from a sample set of 50,000 men and women, across 57 countries.

2.) The title of the report is “Global Index of Religiosity and Atheism”. To generalize the report’s findings to the global human population, what must we assume about the sampling method? Does that seem like a reasonable assumption?

We must assume the report/sample is randomly sampled and independent. This is reasonable due to the heavy potential presence of bias, and other factors that could easily waver the findngs to fit one’s beliefs and expectations.

3.) What does each row of Table 6 correspond to? What does each row of atheism correspond to?

Each row of table 6 corresponds to a county and its correlating percentage in each category. While each row in atheism is only corresponding to those who are atheists and non-atheists.

download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")

4.) Using the command below, create a new dataframe called us12 that contains only the rows in atheism associated with respondents to the 2012 survey from the United States. Next, calculate the proportion of atheist responses. Does it agree with the percentage in Table 6? If not, why?

Yes, it dose agree, even though it’s not exactly 0.05% it’s very close and can be round.

us12 <- subset(atheism, nationality == "United States" & year == "2012")

nrow(us12)

## [1] 1002

us12a <- subset(us12, response == "atheist")
nrow(us12a)

## [1] 50

us12n <- subset(us12, response == "non-atheist")
nrow ("us12n")

## NULL

usathp <-50/1002
usathp

## [1] 0.0499002

5.) Write out the conditions for inference to construct a 95% confidence interval for the proportion of atheists in the United States in 2012. Are you confident all conditions are met?

sample is/must be randomly selected -Observation within the US sample is independent of another, and its distribution normal( np greater than or equal to 10).

inference(us12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

6.) Based on the R output, what is the margin of error for the estimate of the proportion of the proportion of atheists in US in 2012?

The Margin of error is 0.01352

7.) Using the inference function, calculate confidence intervals for the proportion of atheists in 2012 in two other countries of your choice, and report the associated margins of error. Be sure to note whether the conditions for inference are met. It may be helpful to create new data sets for each of the two countries first, and then use these data sets in the inference function to construct the confidence intervals.

I choose Romania and Brazil because the conditions are met for both, Romania had a margin error of 0.000588 while Brazil had one of 0.004312

Romania12 = subset(atheism, atheism$nationality == "Romania" & atheism$year == "2012")
inference(Romania12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0096 ;  n = 1039 
## Check conditions: number of successes = 10 ; number of failures = 1029 
## Standard error = 0.003 
## 95 % Confidence interval = ( 0.0037 , 0.0156 )

margER <- 1.96 * 0.0003
margER

## [1] 0.000588

Brazil12 = subset(atheism, atheism$nationality == "Brazil" & atheism$year == "2012")
inference(Brazil12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.01 ;  n = 2002 
## Check conditions: number of successes = 20 ; number of failures = 1982 
## Standard error = 0.0022 
## 95 % Confidence interval = ( 0.0056 , 0.0143 )

margEB <- 1.96 * 0.0022

8.) Describe the relationship between p and me.

When the proportion moves away from 0.5 the margin of error decreases, in other words, the closer the P is to 0.5 the higher the ME.

n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")

### 9.) Describe the sampling distribution of sample proportions at n=1040 and p=0.1. Be sure to note the center, spread, and shape.Hint: Remember that R has functions such as mean to calculate summary statistics

The mean and median of the distribution are identical, with a fairly similar shape and center, that consists of minimal spread.

p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.1, n = 1040", col = "lavender", xlim = c(0, 0.18))

summary(p_hats)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981

hist(p_hats, 
     main ="p = 0.1, n= 1040",
     xlim = c(0,0.18),
     col = "lavender")

sd(p_hats)

## [1] 0.009287382

10.) Repeat the above simulation three more times but with modified sample sizes and proportions: for n=400 and p=0.1, n=1040 and p=0.02, and n=400 and p=0.02. Plot all four histograms together by running the par(mfrow = c(2, 2)) command before creating the histograms. You may need to expand the plot window to accommodate the larger two-by-two plot. Describe the three new sampling distributions. Based on these limited plots, how does n appear to affect the distribution of p^? How does p affect the sampling distribution?

The spread decreases as the sample size increases, so the larger the sample usually the closest estimate of the population proportion. While the larger the sample size the more normal the shape of the distribution. While the larger the P seems to be the higher the margin of Error.

n1 = 400
p1 = 0.1

n2 = 1040
p2 = 0.02

n3 = 400
p3 = 0.02

p_hats1 <- rep(0,5000)

for(i in 1:5000){
  samp1 <- sample(c("atheist","non_atheist"),n1,replace = TRUE,prob = c(p1, (1-p1)))
  p_hats1[i] <- sum(samp1 == "atheist")/n1
}

p_hats2 <- rep(0,5000)

for(i in 1:5000){
  samp2 <- sample(c("atheist","non_atheist"),n2,replace = TRUE,prob = c(p2, (1-p2)))
  p_hats2[i] <- sum(samp2 == "atheist")/n2
}

p_hats3 <- rep(0,5000)

for(i in 1:5000){
  samp3 <- sample(c("atheist","non_atheist"),n3,replace = TRUE,prob = c(p3, (1-p3)))
  p_hats3[i] <- sum(samp3 == "atheist")/n3
}

par(mfrow=c(2,2))
hist(p_hats1, main = "p = 0.1, n = 400", xlim = c(0, 0.18), col = "lavender")
hist(p_hats2, main = "p = 0.02, n = 1040", xlim = c(0, 0.05), col = "lavender")
hist(p_hats3, main = "p = 0.02, n = 400", xlim = c(0, 0.05), col = "lavender")

### 11.) If you refer to Table 6, you’ll find that Australia has a sample proportion of 0.1 on a sample size of 1040, and that Ecuador has a sample proportion of 0.02 on 400 subjects. Let’s suppose for this exercise that these point estimates are actually the truth. Then given the shape of their respective sampling distributions, do you think it is sensible to proceed with inference and report margin of errors, as the reports does?

Australia would be the only one I would feel confident about due to having sufficient data such as the np and n value being big enough. While Ecuador seems to fall short of meeting the conditions Australia does as its np is only 8.