library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
chds3 <- read_csv("data/ch19ds3.csv")
## Rows: 100 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): gender
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
chds4 <- read_csv("data/ch19ds4.csv")
## Rows: 100 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): preference
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
chds5 <- read_csv("data/ch19ds5.csv")
## Rows: 138 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): strength, age
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Exercise 1

Q3. Using the following data, test the question of whether an equal number of boys (code = 1) and girls (code = 2) participate in soccer at the elementary level at the .01. Use chds3 to compute the exact probability of the chi-square value. What’s your conclusion?

soccer_gender <- table(chds3$gender)
                           chisq.test(soccer_gender)
## 
##  Chi-squared test for given probabilities
## 
## data:  soccer_gender
## X-squared = 1, df = 1, p-value = 0.3173

A p value of .3173 is a large value, indicating that the null hypothesis is likely untrue. p=.3173 indicates that if the null hypothesis were true, there would have been a 31.73% chance of returning the observed values.

Exercise 2

Q5. Half the marketing staff of a candy company argue that all these candy bars taste the same and are barely different from one another. The other half disagrees. Who is right? Use chds4.

candy<- prop.table(table(chds4$preference))
chisq.test(candy)
## Warning in chisq.test(candy): Chi-squared approximation may be incorrect
## 
##  Chi-squared test for given probabilities
## 
## data:  candy
## X-squared = 0.158, df = 4, p-value = 0.997

This candy dataset had a p value of .997, meaning it was wildly unlikely that this set of observations would have been yielded with a null hypothesis.

Exercise 3

Q7. In Chapter 19 Data Set 5 (chds5), you will find entries for two variables: age category (young, middle-aged, and old) and strength following weight training (weak, moderate, strong). Are these two factors independent of one another?

strong <- table(chds5)
chisq.test(strong)
## 
##  Pearson's Chi-squared test
## 
## data:  strong
## X-squared = 3.969, df = 4, p-value = 0.4102
toi <- (table(chds5$age, chds5$strength))

The p value of .4102 is much higher than the significance level of .05, so we would reject a null hypothesis. This indicates that t