How do you quantify the monotonic relationship between Handspan and Height in the qanda dataset?
Can you guess the R function to test this relationship formally?
Write down explicitly your null and alternative hypotheses.
2024-02-20
How do you quantify the monotonic relationship between Handspan and Height in the qanda dataset?
Can you guess the R function to test this relationship formally?
Write down explicitly your null and alternative hypotheses.
\(H_0\): there is no relationship between Handspan and Height.
\(H_1\): there is a relationship between Handspan and height.
cor.test (qanda$Handspan, qanda$Height)
## ## Pearson's product-moment correlation ## ## data: qanda$Handspan and qanda$Height ## t = 3.5256, df = 228, p-value = 0.0005106 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.1009848 0.3465390 ## sample estimates: ## cor ## 0.2273731
I use p-value to quantify the relationship, as it reveals whether the hypothesis is statistically significant.
I use the spearman correlation test:
cor.test(qanda$Handspan, qanda$Height, method = "spearman", paired = TRUE)
## ## Spearman's rank correlation rho ## ## data: qanda$Handspan and qanda$Height ## S = 1340216, p-value = 1.354e-07 ## alternative hypothesis: true rho is not equal to 0 ## sample estimates: ## rho ## 0.3390771
Begin a new program. Download Exam_results.csv.
What type of data do we have?
Create a scatter plot for the data.
Does there appear to be a connection between the orders of the two sets of exam results? Save the plot as a file.
we have discrete data in this case, or bivariate.
Update your program from Exercise 6.2 so that it carries out the Wilcoxon test and,
when you source() it (see Section 3.2), it outputs the test statistic, p-value and method used, to the screen with appropriate text descriptions.
Extend your program from Exercise 6.3 so that it
outputs a conclusion (using if()), which depends upon the p-value.
result <- wilcox.test (er$Exam1, er$Exam2, paired = TRUE)
cat ("The test statistic is:", result$statistic)
cat ("\n The p-value is:", result$p.value)
cat ("\n The method I use is:", result$method, "\n\n")
p <- result$p.value
if (p < 0.05) {
print ("There is a difference between the orders of the two test results")
} else{
print ("There is no difference between the orders of the two test results")
}
## The test statistic is: 1143
## ## The p-value is: 0.01482317
## ## The method I use is: Wilcoxon signed rank test with continuity correction
## [1] "There is a difference between the orders of the two test results"
Generate two random samples (simulations):
one of length 24 from an exponential distribution with mean 1 and
one of length 35 from an exponential distribution with mean 2
Test the hypothesis that they come from the same population,
outputting the p-value and a written conclusion to your test, similar to that you produced in Exercise 6.4.
set.seed (100)
sa <- rexp (n = 24, rate = 1)
sb <- rexp (n = 35, rate = 1/2)
result_ab <- wilcox.test(sa, sb)
cat ("The test statistic is:", result_ab$statistic)
cat ("\n The p-value is:", result_ab$p.value)
cat ("\n The method I use is:", result_ab$method, "\n\n")
p <- result_ab$p.value
if (p < 0.05) {
print ("Two samples are independent of each other")
} else{
print ("Two samples are not independent of each other")
}
## The test statistic is: 201
## ## The p-value is: 0.0005453868
## ## The method I use is: Wilcoxon rank sum exact test
## [1] "Two samples are independent of each other"
Have another look at the histogram for the log of population you created in Exercise 2.6 and update it so that
it has breaks from 11.5 to 21.5 with intervals of 1.
On your plot,
add points representing the values of a Poisson distribution with mean 16 (roughlythe value of the mean of the log data) for the range of values.
Does that distribution appear to fit the data well?
Carry out a goodness of fit test comparing the data and distribution for four values: less than 16, 16 - 17, 18 - 19, greater than 19. What is the conclusion?
Seems not ……
dfpoints <- data.frame (pop = pop,
prob_pop = poispoints,
expected_pop = poispoints * sum(pop))
observed_less16 <- pop [pop < 16]
row_less16 <- which (dfpoints$pop < 16)
expected_less16 <- dfpoints[row_less16, ] $expected_pop
result_less16 <- chisq.test(observed_less16,
p = expected_less16 / sum(expected_less16))
print (result_less16)
## ## Chi-squared test for given probabilities ## ## data: observed_less16 ## X-squared = 31.328, df = 14, p-value = 0.004986
observed_1617 <- pop [pop > 16 & pop < 17]
row_1617 <- which (dfpoints$pop > 16 & dfpoints$pop < 17)
expected_1617 <- dfpoints[row_1617, ] $expected_pop
result_1617 <- chisq.test(observed_1617,
p = expected_1617 / sum(expected_1617))
print (result_1617)
## ## Chi-squared test for given probabilities ## ## data: observed_1617 ## X-squared = 3.4023, df = 7, p-value = 0.8455
observed_1819 <- pop [pop > 18 & pop < 19]
row_1819 <- which (dfpoints$pop > 18 & dfpoints$pop < 19)
expected_1819 <- dfpoints[row_1819, ] $expected_pop
result_1819 <- chisq.test(observed_1819,
p = expected_1819 / sum(expected_1819))
print (result_1819)
## ## Chi-squared test for given probabilities ## ## data: observed_1819 ## X-squared = 8.5813, df = 2, p-value = 0.0137
observed_more19 <- pop [pop > 19]
row_more19 <- which (dfpoints$pop > 19)
expected_more19 <- dfpoints[row_more19, ] $expected_pop
result_more19 <- chisq.test(observed_more19,
p = expected_more19 / sum(expected_more19))
print (result_more19)
## ## Chi-squared test for given probabilities ## ## data: observed_more19 ## X-squared = 3.0322, df = 3, p-value = 0.3867
pdf <- data.frame (group = c ("less than 16", "16 - 17",
"18 - 19", "greater than 19"),
"p-values" = c (result_less16$p.value,
result_1617$p.value,
result_1819$p.value,
result_more19$p.value))
print (pdf)
## group p.values ## 1 less than 16 0.004985531 ## 2 16 - 17 0.845465442 ## 3 18 - 19 0.013695929 ## 4 greater than 19 0.386686989
observed <- pop
expected <- dfpoints$expected_pop
result <- chisq.test(observed,
p = expected / sum(expected))
print(result)
## ## Chi-squared test for given probabilities ## ## data: observed ## X-squared = 65.058, df = 35, p-value = 0.001505
pdf <- data.frame (group = c ("less than 16", "16 - 17",
"18 - 19", "greater than 19",
"overall"),
"p-values" = c (result_less16$p.value,
result_1617$p.value,
result_1819$p.value,
result_more19$p.value,
result$p.value))
print (pdf)
## group p.values ## 1 less than 16 0.004985531 ## 2 16 - 17 0.845465442 ## 3 18 - 19 0.013695929 ## 4 greater than 19 0.386686989 ## 5 overall 0.001505146
\(H_0\): The Poisson (mean) model is a good fit.
\(H-1\): The Poisson (mean) model is not a good fit.