Project #6 - Inference on Categorical Data

Purpose

In this project, students will demonstrate their understanding of the inference on categorical data. There will also be inference on numerical data mixed in and the student is expected to identify the proper type of inference to apply depending on the variable type. If not specifically mentioned, students should assume a significance level of 0.05.

Preparation

The project will use two datasets from the internet – atheism and nscc_student_data. Store the atheism dataset in your environment by using the following R chunk and do some exploratory analysis of the dataframe. None of this will be graded, just something for you to do on your own.

# Download atheism dataset from web
download.file("http://s3.amazonaws.com/assets.datacamp.com/course/dasi/atheism.RData", destfile = "atheism.RData")

# Load dataset into environment
load("atheism.RData")

Load the “nscc_student_data.csv” file in the following R chunk below and refamiliarize yourself with this dataset as well.

#Load the dataset into R and attempt to better understand the information 
nscc <- read.csv("nscc_student_data.csv")
str(nscc)

## 'data.frame':    40 obs. of  15 variables:
##  $ Gender      : chr  "Female" "Female" "Female" "Female" ...
##  $ PulseRate   : int  64 75 74 65 NA 72 72 60 66 60 ...
##  $ CoinFlip1   : int  5 4 6 4 NA 6 6 3 7 6 ...
##  $ CoinFlip2   : int  5 6 1 4 NA 5 6 5 8 5 ...
##  $ Height      : num  62 62 60 62 66 ...
##  $ ShoeLength  : num  11 11 10 10.8 NA ...
##  $ Age         : int  19 21 25 19 26 21 19 24 24 20 ...
##  $ Siblings    : int  4 3 2 1 6 1 2 2 3 1 ...
##  $ RandomNum   : int  797 749 13 613 53 836 423 16 12 543 ...
##  $ HoursWorking: int  35 25 30 18 24 15 20 0 40 30 ...
##  $ Credits     : int  13 12 6 9 15 9 15 15 13 16 ...
##  $ Birthday    : chr  "July 5" "December 27" "January 31" "6-13" ...
##  $ ProfsAge    : int  31 30 29 31 32 32 28 28 31 28 ...
##  $ Coffee      : chr  "No" "Yes" "Yes" "Yes" ...
##  $ VoterReg    : chr  "Yes" "Yes" "No" "Yes" ...

Question 1 - Single Proportion Hypothesis Test

In the 2010 playoffs, the National Football League (NFL) changed their overtime rules amid concerns that whichever team won an overtime coin toss (by luck) had a significant advantage to win the game. Nicholas Gorgievski, et al, published research in 2010 that stated that out of 414 games won in overtime up to that point, 235 were won by the team that won the coin toss. Test the claim that the team which wins the coin flip wins games more often than its opponent.
(Hint: You must recognize what percent of games you’d expect a team to win if they do not win any more or less than their opponent.)

Write hypotheses and determine tails of the test
\[H_0: p = .5\] \[H_A: p > .5\]
Find p-value of sample data occurring by chance

# Calculate sample proportion of games won by coin flip team
#Store games won by the winning coin toss team and total games won in OT 
x <- 235
n <- 414
p.hat <- 235/414

# SE
se <- sqrt(.5*(1-.5)/n)

# Test statistic
zp <- abs(x/n-.5)/se

# p-value 
pnorm(zp, lower.tail = FALSE)*2

## [1] 0.005918735

Decision based on a significance level of .05 p < .05 so we reject the null hypothesis.
State conclusion
The data suggests that the team that wins the coin flip wins games more often than its opponent.

Question 2 - Two Independent Proportions Hypothesis Test

For questions 2 and 3, consider the atheism dataset loaded at the beginning of the project. An atheism index is defined as the percent of a population that identifies as atheist. Is there convincing evidence that Spain has seen a change in its atheism index from 2005 to 2012?

# Create subsets for Spain 2005 and 2012
spain2005 <- subset(atheism, nationality == "Spain" & year == 2005)
spain2012 <- subset(atheism, nationality == "Spain" & year == 2012)

Write hypotheses and determine tails of the test \[H_0: p_1 = p_2\] \[H_A: p_1 \neq p_2\]
Find p-value of sample data occurring by chance

#Store amount of people that identify as atheist for Spain in 2005 and 2012 and total amount of responses
table(atheism)

## , , year = 2005
## 
##                                               response
## nationality                                    atheist non-atheist
##   Afghanistan                                        0           0
##   Argentina                                         20         982
##   Armenia                                            0           0
##   Australia                                          0           0
##   Austria                                          100         903
##   Azerbaijan                                         0           0
##   Belgium                                            0           0
##   Bosnia and Herzegovina                            90         910
##   Brazil                                             0           0
##   Bulgaria                                          50         941
##   Cameroon                                          25         479
##   Canada                                            60         943
##   China                                              0           0
##   Colombia                                          18         588
##   Czech Republic                                   200         800
##   Ecuador                                            4         396
##   Fiji                                               0           0
##   Finland                                           69         915
##   France                                           234        1437
##   Georgia                                            0           0
##   Germany                                           50         452
##   Ghana                                              0        1505
##   Hong Kong                                          0           0
##   Iceland                                           51         801
##   India                                             44        1047
##   Iraq                                               0           0
##   Ireland                                            0           0
##   Italy                                             59         928
##   Japan                                            276         924
##   Kenya                                              0        1000
##   Korea, Rep (South)                               168        1356
##   Lebanon                                            0           0
##   Lithuania                                         20        1005
##   Macedonia                                         36        1173
##   Malaysia                                          21         499
##   Moldova                                           22        1063
##   Netherlands                                       35         470
##   Nigeria                                           10        1039
##   Pakistan                                          27        2678
##   Palestinian territories (West Bank and Gaza)       0           0
##   Peru                                              24        1183
##   Poland                                            10         510
##   Romania                                           10        1040
##   Russian Federation                                40         960
##   Saudi Arabia                                       0           0
##   Serbia                                            41         996
##   South Africa                                       2         198
##   South Sudan                                        0           0
##   Spain                                            115        1031
##   Sweden                                             0           0
##   Switzerland                                       35         472
##   Tunisia                                            0           0
##   Turkey                                             0           0
##   Ukraine                                           41         972
##   United States                                     10         992
##   Uzbekistan                                         0           0
##   Vietnam                                            5         495
## 
## , , year = 2012
## 
##                                               response
## nationality                                    atheist non-atheist
##   Afghanistan                                        0        1031
##   Argentina                                         70         921
##   Armenia                                           10         485
##   Australia                                        104         935
##   Austria                                          100         902
##   Azerbaijan                                         0         509
##   Belgium                                           42         485
##   Bosnia and Herzegovina                            40         960
##   Brazil                                            20        1982
##   Bulgaria                                          19         987
##   Cameroon                                          15         489
##   Canada                                            90         912
##   China                                            235         265
##   Colombia                                          18         588
##   Czech Republic                                   300         700
##   Ecuador                                            8         396
##   Fiji                                              10        1008
##   Finland                                           59         926
##   France                                           485        1203
##   Georgia                                           10         990
##   Germany                                           75         427
##   Ghana                                              0        1490
##   Hong Kong                                         45         455
##   Iceland                                           85         767
##   India                                             33        1059
##   Iraq                                               0        1000
##   Ireland                                          100         910
##   Italy                                             79         908
##   Japan                                            372         840
##   Kenya                                             20         980
##   Korea, Rep (South)                               229        1294
##   Lebanon                                           10         495
##   Lithuania                                         10        1005
##   Macedonia                                         12        1197
##   Malaysia                                           0         520
##   Moldova                                           54        1031
##   Netherlands                                       71         438
##   Nigeria                                           10        1039
##   Pakistan                                          54        2650
##   Palestinian territories (West Bank and Gaza)      25         602
##   Peru                                              36        1171
##   Poland                                            26         499
##   Romania                                           10        1029
##   Russian Federation                                60         940
##   Saudi Arabia                                      25         475
##   Serbia                                            31        1005
##   South Africa                                       8         194
##   South Sudan                                       61         959
##   Spain                                            103        1042
##   Sweden                                            40         455
##   Switzerland                                       46         467
##   Tunisia                                            0         498
##   Turkey                                            21        1011
##   Ukraine                                           30         983
##   United States                                     50         952
##   Uzbekistan                                        10         490
##   Vietnam                                            0         500

x1 <- 113
x2 <- 103
n1 <- 1031 + 113
n2 <- 1042 + 103

#Calculate p-value and make decision based on a significance level of .05
x.list <- c(x1, x2)
n.list <- c(n1, n2)

prop.test(x = x.list, n = n.list, alternative = "two.sided", correct = FALSE)

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  x.list out of n.list
## X-squared = 0.5209, df = 1, p-value = 0.4705
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.01512950  0.03276928
## sample estimates:
##     prop 1     prop 2 
## 0.09877622 0.08995633

Decision
Based on a p-value > .05, we fail to reject the null hypothesis.
State conclusion
There is not enough data to suggest that Spain has seen a change in its atheism index from 2005 to 2012.

Question 3 - Two Independent Proportions Hypothesis Test

Is there convincing evidence that the United States has seen a change in its atheism index from 2005 to 2012?

# Create subsets for USA 2005 and 2012
USA2005 <- subset(atheism, nationality == "United States" & year == 2005)
USA2012 <- subset(atheism, nationality == "United States" & year == 2012)

Write hypotheses and determine tails of the test \[H_0: p_1 = p_2\] \[H_A: p_1 \neq p_2\]
Find p-value of sample data occurring by chance

#Store amount of people that identify as atheist for the United States in 2005 and 2012, while also storing the total amount of responses
table(USA2005)

## , , year = 2005
## 
##                                               response
## nationality                                    atheist non-atheist
##   Afghanistan                                        0           0
##   Argentina                                          0           0
##   Armenia                                            0           0
##   Australia                                          0           0
##   Austria                                            0           0
##   Azerbaijan                                         0           0
##   Belgium                                            0           0
##   Bosnia and Herzegovina                             0           0
##   Brazil                                             0           0
##   Bulgaria                                           0           0
##   Cameroon                                           0           0
##   Canada                                             0           0
##   China                                              0           0
##   Colombia                                           0           0
##   Czech Republic                                     0           0
##   Ecuador                                            0           0
##   Fiji                                               0           0
##   Finland                                            0           0
##   France                                             0           0
##   Georgia                                            0           0
##   Germany                                            0           0
##   Ghana                                              0           0
##   Hong Kong                                          0           0
##   Iceland                                            0           0
##   India                                              0           0
##   Iraq                                               0           0
##   Ireland                                            0           0
##   Italy                                              0           0
##   Japan                                              0           0
##   Kenya                                              0           0
##   Korea, Rep (South)                                 0           0
##   Lebanon                                            0           0
##   Lithuania                                          0           0
##   Macedonia                                          0           0
##   Malaysia                                           0           0
##   Moldova                                            0           0
##   Netherlands                                        0           0
##   Nigeria                                            0           0
##   Pakistan                                           0           0
##   Palestinian territories (West Bank and Gaza)       0           0
##   Peru                                               0           0
##   Poland                                             0           0
##   Romania                                            0           0
##   Russian Federation                                 0           0
##   Saudi Arabia                                       0           0
##   Serbia                                             0           0
##   South Africa                                       0           0
##   South Sudan                                        0           0
##   Spain                                              0           0
##   Sweden                                             0           0
##   Switzerland                                        0           0
##   Tunisia                                            0           0
##   Turkey                                             0           0
##   Ukraine                                            0           0
##   United States                                     10         992
##   Uzbekistan                                         0           0
##   Vietnam                                            0           0

table(USA2012)

## , , year = 2012
## 
##                                               response
## nationality                                    atheist non-atheist
##   Afghanistan                                        0           0
##   Argentina                                          0           0
##   Armenia                                            0           0
##   Australia                                          0           0
##   Austria                                            0           0
##   Azerbaijan                                         0           0
##   Belgium                                            0           0
##   Bosnia and Herzegovina                             0           0
##   Brazil                                             0           0
##   Bulgaria                                           0           0
##   Cameroon                                           0           0
##   Canada                                             0           0
##   China                                              0           0
##   Colombia                                           0           0
##   Czech Republic                                     0           0
##   Ecuador                                            0           0
##   Fiji                                               0           0
##   Finland                                            0           0
##   France                                             0           0
##   Georgia                                            0           0
##   Germany                                            0           0
##   Ghana                                              0           0
##   Hong Kong                                          0           0
##   Iceland                                            0           0
##   India                                              0           0
##   Iraq                                               0           0
##   Ireland                                            0           0
##   Italy                                              0           0
##   Japan                                              0           0
##   Kenya                                              0           0
##   Korea, Rep (South)                                 0           0
##   Lebanon                                            0           0
##   Lithuania                                          0           0
##   Macedonia                                          0           0
##   Malaysia                                           0           0
##   Moldova                                            0           0
##   Netherlands                                        0           0
##   Nigeria                                            0           0
##   Pakistan                                           0           0
##   Palestinian territories (West Bank and Gaza)       0           0
##   Peru                                               0           0
##   Poland                                             0           0
##   Romania                                            0           0
##   Russian Federation                                 0           0
##   Saudi Arabia                                       0           0
##   Serbia                                             0           0
##   South Africa                                       0           0
##   South Sudan                                        0           0
##   Spain                                              0           0
##   Sweden                                             0           0
##   Switzerland                                        0           0
##   Tunisia                                            0           0
##   Turkey                                             0           0
##   Ukraine                                            0           0
##   United States                                     50         952
##   Uzbekistan                                         0           0
##   Vietnam                                            0           0

x3 <- 10
x4 <- 50
n3 <- 1002
n4 <- 1002

#Calculate p-value and make decision based on a significance level of .05
x.list2 <- c(x3, x4)
n.list2 <- c(n3, n4)

prop.test(x = x.list2, n = n.list2, alternative = "two.sided", correct = FALSE)

## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  x.list2 out of n.list2
## X-squared = 27.49, df = 1, p-value = 1.579e-07
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.05474042 -0.02509990
## sample estimates:
##     prop 1     prop 2 
## 0.00998004 0.04990020

Decision
Based on p-value < .05 , < .001, <.0001, we reject the null hypothesis with confidence.
State conclusion
The data suggests that the United States has seen a change in its atheism index from 2005 to 2012.

Question 4 - Minimum Sample Size

Suppose you’re hired by the local government to estimate the proportion of residents in your state that attend a religious service on a weekly basis. According to the guidelines, the government desires a 95% confidence interval with a margin of error no greater than 3%. You have no idea what to expect for \(\hat{p}\). How many people would you have to sample to ensure that you are within the specified margin of error and confidence level?

#Calculate minimum sample size for for 95% CI with a margin of error no greater than 3%...
1.96^2*(.5)*(.5)/(.03^2)

## [1] 1067.111

For E of 3 points n = 1068.

Question 5

Use the NSCC Student Dataset for the Questions 5-7.
Construct a 95% confidence interval of the true proportion of all NSCC students that are registered voters.

#Find sample proportion of NSCC students that are registered voters
table(nscc$VoterReg == "Yes")

## 
## FALSE  TRUE 
##     9    31

x5 <- 31
n5 <- 40

p.hat2 <- 31/40

#Calculate SE
se2 <- sqrt(p.hat2*(1-p.hat2)/n5)

#Lower bound of CI
p.hat2 - 1.96*se2

## [1] 0.6455899

#Upper bound of CI
p.hat2 + 1.96*se2

## [1] 0.9044101

The 95% confidence interval for the true proportion of NSCC students that are registered voters is between 64.56% and 90.44%.

Question 6

Construct a 95% confidence interval of the average height of all NSCC students.

#Find the mean height and SD for the sample proportion of NSCC students and find the sample size not including NA
mean <- mean(nscc$Height, na.rm = TRUE)
sd <- sd(nscc$Height, na.rm = TRUE)
n6 <- sum(!is.na(nscc$Height))

#Calculate t-critical value for 95% confidence
t <- abs(qt(p = .025, df = n6 - 1))
#Calculate margin of error
me <- t * sd/sqrt(n6)

#Boundaries of confidence interval
mean - t*sd/sqrt(n6)

## [1] 61.08699

mean + t*sd/sqrt(n6)

## [1] 67.96173

We can be 95% confident that the average height of all NSCC students is between 61.09 and 67.96.

Question 7

Starbucks is considering opening a coffee shop on NSCC Danvers campus if they believe that more NSCC students drink coffee than the national proportion. A Gallup poll in 2015 found that 64% of all Americans drink coffee. Conduct a hypothesis test to determine if more NSCC students drink coffee than other Americans.

a.) Write hypotheses and determine tails of the test
\[H_0: p = .64\] \[H_A: p > .64\]

b.) Calculate sample statistics

#Find sample proportion of NSCC students that drink coffee
table(nscc$Coffee == "Yes")

## 
## FALSE  TRUE 
##    10    30

x7 <- 30
n7 <- 40

c.) Determine probability of getting sample data by chance

#Calculate p-value and evaluate based on a significance level of .05
prop.test(x = x7, n = n7, p = .64, alternative = "greater", correct = FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  x7 out of n7, null probability 0.64
## X-squared = 2.1007, df = 1, p-value = 0.07362
## alternative hypothesis: true p is greater than 0.64
## 95 percent confidence interval:
##  0.6240271 1.0000000
## sample estimates:
##    p 
## 0.75

d.) Decision
Based on a p-value > .05, we fail to reject the null hypothesis.

e.) Conclusion
There is not enough data to suggest that NSCC students drink more coffee than the national proportion.