Chapter6b

Author

Kementari Whitcher

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)

Two-sample test of proportions

Compute a two-sample proportions test. General form is prop.test(x, n, p = NULL, alternative = “two.sided”, correct = TRUE) where X is a vector of success counts, n is a vector of total counts, alternative indicates the alternative hypothesis, and correct = a logical indicating whether Yates’ correction should be used where possible.

NOTE - This gives you a Chi-squared, but if you use correct = FALSE, the square root of the Chi-squared value = Z.

test_result1<-prop.test(x=c(92,88),n=c(180,220), correct = FALSE)
test_result1


    2-sample test for equality of proportions without continuity correction

data:  c(92, 88) out of c(180, 220)
X-squared = 4.9383, df = 1, p-value = 0.02627
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01352317 0.20869906
sample estimates:
   prop 1    prop 2 
0.5111111 0.4000000

Problem 6.27

test_result6_27<-prop.test(x=c(35,35),n=c(203,292), correct = FALSE)
test_result6_27


    2-sample test for equality of proportions without continuity correction

data:  c(35, 35) out of c(203, 292)
X-squared = 2.7237, df = 1, p-value = 0.09887
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.01138673  0.11648828
sample estimates:
   prop 1    prop 2 
0.1724138 0.1198630

One-way Chi-square

Assuming equal likelihood Step 1 - create a vector with the two categories, and make it a table, including margins

tab_meds<-c(support=15, against = 256) |>
  as.table()
tab_meds |>
  addmargins()

support against     Sum 
     15     256     271

Now let k = number of categories, and perform the test

fit_chisq_meds <- tab_meds |>
  chisq.test()
fit_chisq_meds$observed

support against 
     15     256

fit_chisq_meds$expected

support against 
  135.5   135.5

fit_chisq_meds


    Chi-squared test for given probabilities

data:  tab_meds
X-squared = 214.32, df = 1, p-value < 2.2e-16

What if the data is not expected to fall equally across categories, as in problem 6.34? Here, probabilities are .048, .147, .396, and .409.

tab_deer<-c(woods=4, grass=16, forest = 61, other = 345) |>
  as.table()
tab_deer |>
  addmargins()

 woods  grass forest  other    Sum 
     4     16     61    345    426

Run the test. Note the slight difference in chi-square score due to rounding when doing it by hand.

fit_chisq_deer <- tab_deer |>
  chisq.test(p=c(.048, .147, .396, .409))
fit_chisq_deer$observed

 woods  grass forest  other 
     4     16     61    345

fit_chisq_deer$expected

  woods   grass  forest   other 
 20.448  62.622 168.696 174.234

fit_chisq_deer


    Chi-squared test for given probabilities

data:  tab_deer
X-squared = 284.06, df = 3, p-value < 2.2e-16

2-Way Chi-square Test of Independence

First, you can perform it with the prop.test command if you have a 2X2 matrix.

test_result2<-prop.test(x=c(13,2),n=c(150,121), correct = FALSE)
test_result2


    2-sample test for equality of proportions without continuity correction

data:  c(13, 2) out of c(150, 121)
X-squared = 6.3011, df = 1, p-value = 0.01207
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01970727 0.12056821
sample estimates:
    prop 1     prop 2 
0.08666667 0.01652893

But you won’t always have that, so here’s the more direct way: Step 1 - create your data table

tab_meds2<-tibble::tribble(~meds,~age,~count,
                           "Yes Meds", "Under 50", 13,
                           "Yes Meds","50+", 2,
                           "No Meds", "Under 50", 137,
                           "No Meds", "50+", 119) |>
  tidyr::pivot_wider(names_from = age, values_from=count) |>
  tibble::column_to_rownames(var="meds") |>
  as.matrix() |>
  as.table()
tab_meds2

         Under 50 50+
Yes Meds       13   2
No Meds       137 119

Then perform the test. The following uses Yates’ correction:

tab_meds2 |>
  chisq.test()


    Pearson's Chi-squared test with Yates' continuity correction

data:  tab_meds2
X-squared = 5.0311, df = 1, p-value = 0.0249

Without correction (this is what we got before):

tab_meds2_res<-tab_meds2 |>
  chisq.test(correct=FALSE)
tab_meds2_res


    Pearson's Chi-squared test

data:  tab_meds2
X-squared = 6.3011, df = 1, p-value = 0.01207

We can also get Cramer’s v, which operates like an R-squared to measure the strength of the association. (1) Install the package “vcd” if you have not done so, and library it.

library(vcd)

Loading required package: grid

Then run the command assocstats on the data table to get Cramer’s V. Note this gives you the Chi-squared test again, and some other measures of association.

assocstats(tab_meds2)

                    X^2 df  P(> X^2)
Likelihood Ratio 7.1716  1 0.0074068
Pearson          6.3011  1 0.0120661

Phi-Coefficient   : 0.152 
Contingency Coeff.: 0.151 
Cramer's V        : 0.152