library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.0.4
library(openintro)
library(dplyr)
library(plotrix)
library("MASS")
## Warning: package 'MASS' was built under R version 4.0.5

QUIZ QUESTION 1

  1. Suppose that as part of a program for counseling patients with many risk factors for diabetes, 100 obese patients are identified. From this group, 23 went on a highly restrictive diet. After one year, 9 out of those 23 patients are found to have gone off the diet and reverted to poor eating habits.
  1. Calculate pˆ (sample prop) and p~ (Wilson-adjusted sample proportion) (round to three decimal places) for the proportion of obese patients who reverted to poor eating habits.

  2. Using p~ calculate the 95% confidence interval to estimate the true proportion of obese patients who reverted to poor eating habits (recidivists). Be sure to consider the SE calculation under 𝑝𝑝�.

  3. Interpret your answer in context of the problem.

Answer 1A: p-hat, the sample proportion: Point estimator(p-hat=x/n) = 0.3913043 = .391 is the proportion of obese patients on the highly restrictive diet who reverted to poor eating habits.

1A: p~ (Wilson-adjusted sample proportion): = 0.4074074 = .407 is the adjusted proportion of obese patients on the highly restrictive diet who reverted to poor eating habits

1B: To begin, if the conditions for inference are reasonable, we would calculate the standard error and construct the 95% confidence interval using the inference function.

1C: As seen below, (and as we learned in one of our recent labs) the conditions for inference are not satisfied, so I could not calculate the 95% confidence interval. There aren’t at least 10 successes and 10 failures, so the simulation method must be used instead.

#  1A:   Calculate pˆ(sample prop) for the proportion of obese patients who reverted to poor eating habits.

id <- c("1", "2", "3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23")
restrictive_diet <- c("1","1","1","1","1","1","1","1","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0")

results <- data.frame(id, restrictive_diet)
# 1A: calculate p_hat, the sample proportion of obese patients who reverted to poor eating habits
#get number of patients on restrictive diet who reverted to poor eating habits
results_poor <- nrow (results[results$restrictive_diet == "1",])
results_poor
## [1] 9
#  1A:  calculate p_hat, the sample proportion of obese patients who reverted to poor eating habits
#get number of patients on the restrictive diet (rows)

results_rows <- nrow(results)
results_rows
## [1] 23
#  1A:  calculate p_hat, the sample proportion of obese patients who reverted to poor eating habits
results_poor/results_rows
## [1] 0.3913043
#1A - Part II:  1A:  p~ (Wilson-adjusted sample proportion):
#data set
id <- c("1", "2", "3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23")
restrictive_diet <- c("1","1","1","1","1","1","1","1","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0")

results <- data.frame(id, restrictive_diet)



results_poor                                                             #reverted to poor habits after 12 months
## [1] 9
results_rows                                                             #total number on restrictive diet
## [1] 23
results_continue <- nrow (results[results$restrictive_diet == "0",])     #continued diet after 12 months
results_continue
## [1] 14
obese <- 100                                                             #total population of obese patients

p_w_adj = (results_poor +2)/(results_rows +4)                             #calculation for Wilson's-Adjusted Sample Proportion

p_w_adj
## [1] 0.4074074
#1B:  Using p~ (0.4074074 rounded to .407) ) calculate the 95% confidence interval to estimate the true proportion of obese patients who #reverted to poor eating habits (recidivists). Be sure to consider the SE calculation under 𝑝𝑝�.


#We can calculate a 95% confidence interval for a sample mean by adding and subtracting 1.96 standard errors to the point estimate.

# The standard error is the standard deviation divided by the square root of the sample size.
  
se <- sd(9)/sqrt(23)

se <- std.error(23)

lower = .407 - (1.96 * se)
upper = .407 + (1.96 * se)
lower
## [1] NA
upper
## [1] NA
se
## [1] NA
#inference(results$restrictive_diet, est = "proportion", type = "ci", method = "theoretical", 
        #  success = "1")

QUIZ QUESTION 2

Type of bean Number of eggs Pinto 167 Cowpea 176 Navy bean 174 Northern bean 194

A researcher investigated the cowpea weevil’s preference for one type of bean over another as a place to lay eggs. Equal amounts of four types of seeds were placed into a jar and cowpea weevil were added. After 4 days, the following data was collected (below). Do these data provide evidence of a preference for some types of beans over others, i.e., are the data consistent with the claim that the eggs are distributed randomly among the four types of beans? Perform a 2 χ test.

  1. State the null and alternative hypotheses.
  2. Report the 2 χ statistic and the p-value.
  3. Interpret the results of the test in context of the problem.

Answers

  1. H0: Eggs are distributed evenly amongst all four types of bean plants HA: At least one of the types of bean plants has a different number of eggs

    bean_total = 711

  2. The 2 χ statistic = 2.2321 ; the df = 3; the p-value = 0.5257

  3. Because the p-value of .5257 > .05, there is no compelling evidence to reject the null hypothesis. There is no evidence that the bean eggs are not distributed randomly amongst the 4 plants.

bean_eggs <- c(167,176,174,194)
chisq <- chisq.test(bean_eggs)
chisq
## 
##  Chi-squared test for given probabilities
## 
## data:  bean_eggs
## X-squared = 2.2321, df = 3, p-value = 0.5257

/////////////////////////////////////////////////////////////////////

QUIZ QUESTION 3

Use the table to answer the following questions. A random sample of 125 college students determined whether they had taken an HIV test or not.

HIV Study with Men and Women Female Male Total HIV Test 11 8 19 No HIV Test 52 54 106 Total 63 62 125

  1. Find the probability that a person had taken the HIV test, given that they are female and the probability that a person had taken the HIV test, given that they are male.

  2. What is the relative risk ratio of taking the HIV test, for females versus males?

  3. Interpret your answer regarding the ratio of HIV tests for females versus males. (2 pts)

  4. Perform a 2 χ test. (Chi-Square test)

    1. State the null and alternative hypotheses.
    2. Report the 2 χ statistic and the p-value.
    3. Interpret the results of the test in context of the problem.

Answer Findings: 11/63 (17.5%) women in the sample had taken the HIV test compared to 8/62 (12.9%) men. (shown in the 2x2 contingency table above)

  1.     Pr{HIV TEST|Female} = 11/63 = .175 = 17.5%  (0.1746032)
        Pr{HIV TEST|Male  } = 8/62 = .129 = 12.9%   (0.1290323)
  2.     Pr{No HIV TEST|Female} = 52/63 = .825 82.5%  (0.8253968)
        Pr{No HIV TEST|Male}   = 54/62 = .870 = 87.0% (0.8709677)
  3. Based on the probability findings in part a, the probability that a female will have an HIV test is higher at 17.% vs. males at 12.9T. The relative risk factor that females will not undergo HIV testing is 82.5% vs a slightly higher rate for males of 87.0%.

3d 1. H0: Pr{HIV TEST|Female} = Pr{HIV TEST|Male} There is no association between females and HIV testing HA: Pr{HIV TEST|Female} > Pr{HIV TEST|Male} There is some association between females and HIV testing 2. See code below - I had problems getting the data into a matrix format in R, so I entered the values a little differently and then ran the 2 χ statistic and the p-value.

  The values given the method I used to enter the table data are as follows:  X-squared = 0.043169, df = 1, p-value = 0.8354
  
  1. Assuming an x-squared = 0.043169 and a df = 1 are correct as shown in the code below, the p-value = 0.8354. Because .8354 is greater than .05, we fail to reject the null hypothesis. There is no compelling evidence that there is an association between females and HIV testing vs. males and HIV testing.
#3a. Find the probability that a person had taken the HIV test, given that they are female and the probability that a
    #person had taken the HIV test, given that they are male

# Create data

female_hiv <- 11
female_noHiv <- 52
male_hiv <- 8
male_noHiv <-54
HIV_Total <- 19
No_HIV_Total <- 106
Male_Total <- 62
Female_Total <-63
#female hiv results

female_hiv_results=female_hiv/Female_Total
female_hiv_results
## [1] 0.1746032
#male hiv results
male_hiv_results= male_hiv/Male_Total
male_hiv_results
## [1] 0.1290323
#female no hiv results
female_noHiv_results=female_noHiv/Female_Total
female_noHiv_results
## [1] 0.8253968
#male no hiv results
male_noHiv_results=male_noHiv/Male_Total
male_noHiv_results
## [1] 0.8709677
##NOT RIGHT!!!  Could not get the values in my maxtrix, so I had to do it this way
## Testing for population prob
## Case A. Tabulated data
x <- c(F = 9, f= 52)
y <- c(M = 8, m = 59)

hiv<- data.frame(x,y)
chisq <-chisq.test(hiv)
chisq
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  hiv
## X-squared = 0.043169, df = 1, p-value = 0.8354

#```{r}

#HIV Study with Men and Women # Female Male Total #HIV Test 11 8 19 #No HIV Test 52 54 106 #Total 63 62 125

#test = matrix(c(11, 8, 19, 52, 54, 106, 63, 62, 125), nrows = 2, ncol= 2) #dimnames(test) = list( c(“HIV Test”, “No HIV Test”), c(“Female”, “Male”)) #test ```

