Exercise 6

2024-02-20

Exercise 6.1.

How do you quantify the monotonic relationship between Handspan and Height in the qanda dataset?
Can you guess the R function to test this relationship formally?
Write down explicitly your null and alternative hypotheses.

Hypotheses:

\(H_0\): there is no relationship between Handspan and Height.

\(H_1\): there is a relationship between Handspan and height.

Correlation test

cor.test (qanda$Handspan, qanda$Height)

## 
##  Pearson's product-moment correlation
## 
## data:  qanda$Handspan and qanda$Height
## t = 3.5256, df = 228, p-value = 0.0005106
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1009848 0.3465390
## sample estimates:
##       cor 
## 0.2273731

I use p-value to quantify the relationship, as it reveals whether the hypothesis is statistically significant.

R function to test it in a formal way

I use the spearman correlation test:

cor.test(qanda$Handspan, qanda$Height, method = "spearman", paired = TRUE)

## 
##  Spearman's rank correlation rho
## 
## data:  qanda$Handspan and qanda$Height
## S = 1340216, p-value = 1.354e-07
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.3390771

Exercise 6.2.

Begin a new program. Download Exam_results.csv.
What type of data do we have?
Create a scatter plot for the data.
Does there appear to be a connection between the orders of the two sets of exam results? Save the plot as a file.

Type of data:

we have discrete data in this case, or bivariate.

Scatter plot:

Yes, there appears to be a connection.

Exercise 6.3 - 4

Exercise 6.3.

Update your program from Exercise 6.2 so that it carries out the Wilcoxon test and,
when you source() it (see Section 3.2), it outputs the test statistic, p-value and method used, to the screen with appropriate text descriptions.

Exercise 6.4.

Extend your program from Exercise 6.3 so that it
outputs a conclusion (using if()), which depends upon the p-value.

Exercises 6.3-4

result <- wilcox.test (er$Exam1, er$Exam2, paired = TRUE)

cat ("The test statistic is:", result$statistic)
cat ("\n The p-value is:", result$p.value)
cat ("\n The method I use is:", result$method, "\n\n")

p <- result$p.value

if (p < 0.05) {
  print ("There is a difference between the orders of the two test results")
} else{
  print ("There is no difference between the orders of the two test results")
}

Exercises 6.3-4

## The test statistic is: 1143

## 
##  The p-value is: 0.01482317

## 
##  The method I use is: Wilcoxon signed rank test with continuity correction

## [1] "There is a difference between the orders of the two test results"

Exercise 6.5. Start a new program.

Generate two random samples (simulations):
one of length 24 from an exponential distribution with mean 1 and
one of length 35 from an exponential distribution with mean 2
Test the hypothesis that they come from the same population,
outputting the p-value and a written conclusion to your test, similar to that you produced in Exercise 6.4.

Exercise 6.5.

set.seed (100)

sa <- rexp (n = 24, rate = 1)

sb <- rexp (n = 35, rate = 1/2)

result_ab <- wilcox.test(sa, sb)

cat ("The test statistic is:", result_ab$statistic)
cat ("\n The p-value is:", result_ab$p.value)
cat ("\n The method I use is:", result_ab$method, "\n\n")

p <- result_ab$p.value

if (p < 0.05) {
  print ("Two samples are independent of each other")
} else{
  print ("Two samples are not independent of each other")
}

Exercise 6.5.

## The test statistic is: 201

## 
##  The p-value is: 0.0005453868

## 
##  The method I use is: Wilcoxon rank sum exact test

## [1] "Two samples are independent of each other"

Exercise 6.6.

Have another look at the histogram for the log of population you created in Exercise 2.6 and update it so that
it has breaks from 11.5 to 21.5 with intervals of 1.

On your plot,

add points representing the values of a Poisson distribution with mean 16 (roughlythe value of the mean of the log data) for the range of values.
Does that distribution appear to fit the data well?
Carry out a goodness of fit test comparing the data and distribution for four values: less than 16, 16 - 17, 18 - 19, greater than 19. What is the conclusion?

Take a look at the log of population histogram:

Add breaks from 11.5 to 21.5 with intervals of 1

add points representing the values of a Poisson distribution with mean 16

Does that distribution appear to fit the data well?

Seems not ……

Carry out a goodness of fit test

dfpoints <- data.frame (pop = pop,
                        prob_pop = poispoints,
                        expected_pop = poispoints * sum(pop))

Less than 16

observed_less16 <- pop [pop < 16]
row_less16 <- which (dfpoints$pop < 16)
expected_less16 <- dfpoints[row_less16, ] $expected_pop

result_less16 <-  chisq.test(observed_less16, 
                             p = expected_less16 / sum(expected_less16))
print (result_less16)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed_less16
## X-squared = 31.328, df = 14, p-value = 0.004986

16 to 17

observed_1617 <- pop [pop > 16 & pop < 17]
row_1617 <- which (dfpoints$pop > 16 & dfpoints$pop < 17)
expected_1617 <- dfpoints[row_1617, ] $expected_pop

result_1617 <-  chisq.test(observed_1617, 
                             p = expected_1617 / sum(expected_1617))
print (result_1617)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed_1617
## X-squared = 3.4023, df = 7, p-value = 0.8455

18 to 19

observed_1819 <- pop [pop > 18 & pop < 19]
row_1819 <- which (dfpoints$pop > 18 & dfpoints$pop < 19)
expected_1819 <- dfpoints[row_1819, ] $expected_pop

result_1819 <-  chisq.test(observed_1819, 
                           p = expected_1819 / sum(expected_1819))
print (result_1819)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed_1819
## X-squared = 8.5813, df = 2, p-value = 0.0137

19+

observed_more19 <- pop [pop > 19]
row_more19 <- which (dfpoints$pop > 19)
expected_more19 <- dfpoints[row_more19, ] $expected_pop

result_more19 <-  chisq.test(observed_more19, 
                           p = expected_more19 / sum(expected_more19))
print (result_more19)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed_more19
## X-squared = 3.0322, df = 3, p-value = 0.3867

table for these results

pdf <- data.frame (group = c ("less than 16", "16 - 17",
                              "18 - 19", "greater than 19"),
                   "p-values" = c (result_less16$p.value,
                                   result_1617$p.value,
                                   result_1819$p.value,
                                   result_more19$p.value))
print (pdf)

##             group    p.values
## 1    less than 16 0.004985531
## 2         16 - 17 0.845465442
## 3         18 - 19 0.013695929
## 4 greater than 19 0.386686989

Overall?

observed <- pop

expected <- dfpoints$expected_pop

result <-  chisq.test(observed, 
                             p = expected / sum(expected))

print(result)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 65.058, df = 35, p-value = 0.001505

updated table

pdf <- data.frame (group = c ("less than 16", "16 - 17",
                              "18 - 19", "greater than 19",
                              "overall"),
                   "p-values" = c (result_less16$p.value,
                                   result_1617$p.value,
                                   result_1819$p.value,
                                   result_more19$p.value,
                                   result$p.value))
print (pdf)

##             group    p.values
## 1    less than 16 0.004985531
## 2         16 - 17 0.845465442
## 3         18 - 19 0.013695929
## 4 greater than 19 0.386686989
## 5         overall 0.001505146

Good model pois(16)?

\(H_0\): The Poisson (mean) model is a good fit.
\(H-1\): The Poisson (mean) model is not a good fit.