Lab 6

Exercise 1

The way it is phrased it seems to be a population parameters. However, in reallity it is a sample statistics.

Exercise 2

The sample should be proportinally distributed (random) among world population with a big sample size for such a diverse population. I doubt it is trully independent sample of reasonable size. So, results might paint incorrect picture. Or in another words the article might end up describing a sample and not world population (overfitting).

download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")

load("atheism.RData")

head(atheism)

##   nationality    response year
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 4 Afghanistan non-atheist 2012
## 5 Afghanistan non-atheist 2012
## 6 Afghanistan non-atheist 2012

summary(atheism)

##              nationality           response          year     
##  Pakistan          : 5409   atheist    : 5498   Min.   :2005  
##  France            : 3359   non-atheist:82534   1st Qu.:2005  
##  Korea, Rep (South): 3047                       Median :2012  
##  Ghana             : 2995                       Mean   :2009  
##  Macedonia         : 2418                       3rd Qu.:2012  
##  Peru              : 2414                       Max.   :2012  
##  (Other)           :68390

Exercise 3

Each row in table 6 corresponds to a country sample and sample statistics.

Each row in the atheism file corresponds an individual response (with country residence and year of response provided).

us12 <- subset(atheism, nationality == "United States" & year == "2012")

prop.table(table(us12$response))

## 
##     atheist non-atheist 
##   0.0499002   0.9500998

Exercise 4

File ateism and the Table 6 agree, both show that 5% of USA sample are atheists.

Exercise 5

Sample should be random, independent (less than 10% of population), and approximately normal (both success and failure more than 10). I am confident about independence and normality, but I am not sure that sample is random (it could be).

inference(us12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

Exercise 6

Margin of error is 0.0499-0.0364 or 0.0135

0.0499-0.0364

## [1] 0.0135

Exercise 7

Confidence interval for France is (0.2657,0.3089).

Confidence interval for Argentina is (0.0547,0.0866)

fr12 <- subset(atheism, nationality == "France" & year == "2012")

summary(fr12)

##       nationality          response         year     
##  France     :1688   atheist    : 485   Min.   :2012  
##  Afghanistan:   0   non-atheist:1203   1st Qu.:2012  
##  Argentina  :   0                      Median :2012  
##  Armenia    :   0                      Mean   :2012  
##  Australia  :   0                      3rd Qu.:2012  
##  Austria    :   0                      Max.   :2012  
##  (Other)    :   0

arg12 <- subset(atheism, nationality == "Argentina" & year == "2012")

summary(arg12)

##       nationality         response        year     
##  Argentina  :991   atheist    : 70   Min.   :2012  
##  Afghanistan:  0   non-atheist:921   1st Qu.:2012  
##  Armenia    :  0                     Median :2012  
##  Australia  :  0                     Mean   :2012  
##  Austria    :  0                     3rd Qu.:2012  
##  Azerbaijan :  0                     Max.   :2012  
##  (Other)    :  0

inference(fr12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.2873 ;  n = 1688 
## Check conditions: number of successes = 485 ; number of failures = 1203 
## Standard error = 0.011 
## 95 % Confidence interval = ( 0.2657 , 0.3089 )

inference(arg12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0706 ;  n = 991 
## Check conditions: number of successes = 70 ; number of failures = 921 
## Standard error = 0.0081 
## 95 % Confidence interval = ( 0.0547 , 0.0866 )

n <- 1000

p <- seq(0, 1, 0.01)

me <- 2*sqrt(p*(1-p)/n)

plot(me~p, ylab="Margin of error", xlab = "Population proportion")

Exercise 8

Margin of error keeps increasing as population proprtion incrreases from 0 to 0.5 and then keeps decreasing as population proportion increases from 0.5 to 1.0.

p <- 0.1

n <- 1040

p_hats <- rep(0,5000)

for(i in 1:5000){
  samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
  p_hats[i]<-sum(samp=="atheist")/n
}

hist(p_hats, main="p = 0.1, n = 1040", xlim=c(0, 0.18))

summary(p_hats)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981

Exercise 9

The distribution appears near normal with mean of 0.1. It spreads from 0.067 to 0.137.

Exercise 10

All 3 distributions appear near normal with first distribution having mean of 0.1 and 2nd and 3rd having mean of 0.02. N appears to effect distribution. Lower N’s have bigger spreads.

p <- 0.1

n <- 400

p_hats1 <- rep(0,5000)

for(i in 1:5000){
  samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
  p_hats1[i]<-sum(samp=="atheist")/n
}

p <- 0.02

n <- 1040

p_hats2 <- rep(0,5000)

for(i in 1:5000){
  samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
  p_hats2[i]<-sum(samp=="atheist")/n
}

p <- 0.02

n <- 400

p_hats3 <- rep(0,5000)

for(i in 1:5000){
  samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
  p_hats3[i]<-sum(samp=="atheist")/n
}

par(mfrow = c(2, 2))

hist(p_hats, main="p = 0.1, n = 1040", xlim=c(0, 0.18))

hist(p_hats1, main="p = 0.1, n = 400", xlim=c(0, 0.18))

hist(p_hats2, main="p = 0.02, n = 1040", xlim=c(0, 0.18))

hist(p_hats3, main="p = 0.02, n = 400", xlim=c(0, 0.18))

summary(p_hats1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.05250 0.09000 0.10000 0.09976 0.11000 0.15500

summary(p_hats2)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.005769 0.017308 0.020192 0.019954 0.023077 0.039423

summary(p_hats3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.01500 0.02000 0.01988 0.02500 0.04750

Exercise 11.

Reporting Margin of error would provide more accurate picture of true population religiosity status, but at the same time it will make report also more confusing for general public who often do not know how to interpret confidence intervals.

On my own.

1

H0: Spain proportions in 2005 and 2012 did not change.

H1: Spain proportions changed in 2012 in comparison to 2005 results.

Conclusion: confidence intervals overlap, so we do not have enough evidence to reject H0. We accept H0, as there is not enough eveidence to conclude that there was substantial change in proportion of atheist since 2005.

H0: United States proportions in 2005 and 2012 did not change.

H1: United Sates proportions changed in 2012 in comparison to 2005 results.

Conclusion: confidence intervals do not overlap, so we have enough evidence to reject H0. We reject H0, as there is enough eveidence to conclude that there was substantial change in proportion of atheist since 2005.

sp5 <- subset(atheism, nationality == "Spain" & year == "2005")

sp12 <- subset(atheism, nationality == "Spain" & year == "2012")

inference(sp5$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.1003 ;  n = 1146 
## Check conditions: number of successes = 115 ; number of failures = 1031 
## Standard error = 0.0089 
## 95 % Confidence interval = ( 0.083 , 0.1177 )

inference(sp12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.09 ;  n = 1145 
## Check conditions: number of successes = 103 ; number of failures = 1042 
## Standard error = 0.0085 
## 95 % Confidence interval = ( 0.0734 , 0.1065 )

us5 <- subset(atheism, nationality == "United States" & year == "2005")

us12 <- subset(atheism, nationality == "United States" & year == "2012")

inference(us5$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.01 ;  n = 1002 
## Check conditions: number of successes = 10 ; number of failures = 992 
## Standard error = 0.0031 
## 95 % Confidence interval = ( 0.0038 , 0.0161 )

inference(us12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

2. ~5% or 2 countries would show change just by chance.

3. From the plot we see that maximum Margin of error is ~0.03 for n=1000. To decrease a margin of error 3 times we need to increase sample at least 9 times or have n=9,000 (I would make it 10,000 to be on save side).

n <- 10000

p <- seq(0, 1, 0.01)

me <- 2*sqrt(p*(1-p)/n)

plot(me~p, ylab="Margin of error", xlab = "Population proportion")

Math and Statistics Lab 6

Mikhail Groysman

November 22, 2018