R Markdown

  1. The Harrisburg Senators are our city’s Minor League Baseball team. During the 2022 season they ranked the lowest in the Eastern League. Let’s figure out if the Pitchers show any detectable difference on the team’s Wins and Losses. Data extracted from https://www.baseball-reference.com/register/team.cgi?id=f4f65b83
  1. The HarrisburgSenators.csv file is attached to the ‘Test of Association Assignment’ in Canvas. Import this file into R. Copy and paste your code below that shows where you imported the data. Name the data frame that you create ‘Senators.’
library(readr)
HarrisburgSenators <- read_csv("HarrisburgSenators.csv")
## Error: 'HarrisburgSenators.csv' does not exist in current working directory ('/Users/xinyizhu/Dropbox').
View(HarrisburgSenators)
## Error in as.data.frame(x): object 'HarrisburgSenators' not found
  1. When you examine the data frame you imported into R, you will notice that it is not in the appropriate format for performing a chi-square test. Convert the data frame that you imported into a table such that the columns show the Win-Lose (W and L) variable and the rows show Age variable (Separate the players in the middle, take the median age and split it into a binary Age variable). Use the R categorical functions that you have learned in the course. Name the table you create ‘Senators_t’. Copy and paste your code below that created the table.

  2. Develop the appropriate null and alternative hypothesis for testing an association among these variables (Winning-Losing, Age group). State the null and alternative hypothesis in the context of this problem.

H0: H1:

  1. Run the chi-square test in R using the assocstats() function. Copy and paste your code and the output below. What is the value of the chi-square test statistic? What is the p-value related to the chi-square test statistic?

  2. Compute the χ2 test statistic that you obtained from the R output in part d. You should show your computations. Show the computations of the expected counts. Provide a final table that includes the observed counts with the expected counts in parenthesis for each cell. The final table should have row and column margin totals and a grand total.

  3. Does the test statistic you found by hand in part e match the χ2 test statistic from the assocstats() output in part d? State Yes or No.

  4. Use the pchisq() function in R to find the p-value associated with the χ2 test statistic. Copy and paste your code below. Does the p-value you found with the pchisq() function match the p-value from the assocstats() output in part d? State Yes or No.

  5. Using the p-value for this test, do you reject or not reject the null hypothesis?

  6. State a conclusion back in terms of the context of the problem.

  1. In a previous assignment, you worked with the following data from Exercise 4.1 on p. 158 of our text. The scenario in this problem examines a sample of individuals in which we have information regarding disease status based on a type of diet. Run the code below in R to remind yourself what the contingency table looks like. Notice the numbers in the contingency table have been modified from the previous assignment.
#code from text

fat <- matrix(c(6, 4, 2, 11), 2, 2) 
dimnames(fat) <- list(diet = c("LoChol", "HiChol"), 
                       disease = c("No", "Yes"))

fat
##         disease
## diet     No Yes
##   LoChol  6   2
##   HiChol  4  11

You want to perform another test of association between the variables diet and disease. You decide not to use an odds ratio test.

  1. Given the data in the contingency table, what is the appropriate statistical test to perform if you want to determine if an association between diet and disease exists.

Only state the name of appropriate statistical test, you do not have to perform the test.

  1. What is your reason for stating the test in part a?
  1. Read the first two paragraphs of Example 4.3 on p. 118 to familiarize yourself with the scenario and the data. Run the first two lines of R code in the example to create the mental table in R.

You are going to be using the Cochran Mantel-Haenszel (CMH) test to test some associations among these variables.

  1. Why is a CMH test appropriate for this type of data?

  2. Run the CMH test in R. Copy and paste your code and output below.

Letters c-e are asking you questions specific to the R output line entitled ‘rmeans.’ c. State the null and alternative hypothesis for the line in the R output entitled ‘rmeans.’ State the null and alternative hypothesis in the context of the problem; do not use the generic statements from the course notes. H0:

H1:

  1. What is the decision of this statistical test, that is, can you reject or not reject the null hypothesis? Use α=level of significance=.05.

  2. Using the line ‘rmeans’ in the R output, state a conclusion back in the context of the problem.

Letters f-h are asking you questions specific to the R output line entitled ‘cmeans.’ f. State the null and alternative hypothesis for the line in the R output entitled ‘cmeans.’ State the null and alternative hypothesis in the context of the problem; do not use the generic statements from the course notes.

H0:

H1:

  1. What is the decision of this statistical test, that is, can you reject or not reject the null hypothesis? Use α=level of significance=.05.

  2. Using the line ‘cmeans’ in the R output, state a conclusion back in the context of the problem.

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.