Importing Data **************
These lines of code import your data. Please make sure that the csv files are stored in the same file as this R Markdown file.
body_data <- read.csv("bodytemp_heartrate.csv")
stress_data <- read.csv("stress+and+talk.csv")
Section 1 - Detailed Hypothesis Testing ***************************************
Instructions: Please answer all of the subpoints of each of the questions. For critical values, if there are two, please report both. Any data that you calculate should be reported in a data frame that you print at the bottom of your block of code. For written answers, please write them below the subpoint in the R code.
Suppose that you are a zoologist studying the heights of giraffes. You know that the population mean for giraffe height is 550 cm with a standard deviation of 30 cm. On your trip to Tanzania, you use advanced technology to collect the height for as many Masai giraffes as you can find. After sampling 25 Masai giraffes, you find that their average height is 537 cm. You think that this is a low number, but you are interested in whether the Masai giraffes are significantly shorter (\(\alpha = 0.05\)) than the population average.
#Insert your R code here (use as many R lines as you need)
## 1b).
mu <- 550
sigma <- 30
alpha <- 0.05
n <- 25
xbar <- 537
zcrit <- qnorm(alpha)
zobt <- (xbar-mu)/(sigma/sqrt(n))
p <- pnorm(zobt, lower.tail = TRUE)
dofz <- data.frame(
Variables = c("Z-crit", "Z-obt", "P-Value"),
Values = c(zcrit, zobt, p)
)
print(dofz)
## Variables Values
## 1 Z-crit -1.64485363
## 2 Z-obt -2.16666667
## 3 P-Value 0.01513014
1a) Write out the null and alternative hypotheses
H0: mu = 550
Ha: mu < 550
1b) Within a data frame in R, report the following values: Report the appropriate critical value(s) Report the calculated statistic What is the probability of finding a Masai giraffe of 537 cm within the population?
1c) What is your decision in regards to the null and alternative hypotheses?
1d) Interpret your decision in terms of statistical significance
A professor is skeptical of whether the lighting in the class room helps or hurts students’ grades. He wants to compare the exam scores of his current class that has dim lights to his past classes that had bright lights. Since he has kept track of every single mean exam score over his 10 years of teaching the course (where the exam stayed exactly the same), he knows that the mean score on the exam is a 76.8 out of 100. However, he only kept track of the mean. A group of 30 students received the following scores on the exam. The professor is wondering, is there any difference between these scores compared to the population mean at an alpha of 0.01?
(64.0, 79.5, 74.6, 76.8, 72.0, 65.9, 95.9, 68.9, 98.6, 65.6, 62.6, 75.9, 81.8, 95.2, 68.9, 96.8, 70.1, 93.2, 84.3, 77.3, 66.1, 83.4, 61.4, 73.2, 70.1, 71.9, 79.1, 70.3, 92.4, 78.8)
#Insert your R code here (use as many R lines as you need)
examscore <- c(64.0, 79.5, 74.6, 76.8, 72.0, 65.9, 95.9, 68.9, 98.6, 65.6, 62.6, 75.9, 81.8, 95.2, 68.9, 96.8, 70.1, 93.2, 84.3, 77.3, 66.1, 83.4, 61.4, 73.2, 70.1, 71.9, 79.1, 70.3, 92.4, 78.8)
mu <- 76.8
n <- length(examscore)
alpha <- 0.01
dof <- n-1
xbar <- mean(examscore)
s <- sd(examscore)
se <- s/sqrt(n)
tobt <- (xbar-mu)/se
t_test <- t.test(examscore, alternative = "two.sided", mu = mu, paired = FALSE, conf.level = 0.99)
tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
pt <- pt(tobt, dof, lower.tail = FALSE)*2
# pt doesnt exactly do 2 tailed tests??
doft <- data.frame(
Variables = c("T_crit_lower", "T_crit_upper", "T_obt", "P_Value"),
Values = c(tcrit_lower, tcrit_upper, tobt, pt)
)
print(doft)
## Variables Values
## 1 T_crit_lower -2.7563859
## 2 T_crit_upper 2.7563859
## 3 T_obt 0.1758421
## 4 P_Value 0.8616402
2a) Write out the null and alternative hypotheses
H0: mu = 76.8
Ha: mu =/=76.8
2b) Within a data frame in R, report the following values:
Report the appropriate critical value(s) (hint, how many tails are in this test?)
Report the calculated statistic.
Report the p value.
| T_crit_lower | -2.7563859 |
| T_crit_upper | 2.7563859 |
| T_obt | 0.1758421 |
| P_Value | 0.8616402 |
2c) What is your decision in regards to the null and alternative hypotheses?
2d) Interpret your decision in terms of statistical significance.
Please use “stress+and+talk.csv” for both questions 3 and 4, as they are two different questions about the same data.
A random sample of 15 people participated in a study, which tried to investigate the relationship between stress level and how fast a person can read an article. The study was divided into two parts.
In the first part (1), every subject was asked to read aloud an article from Readers’ Digest, and they were told that they will be asked a question about the article.
In the second part (2) of the study, everything else remained the same, but those same subjects then stood on a balance ball while reading the article, which invoked stress in the participant.
The researchers wondered whether the words read per minute differed (alpha = 0.05) when participants were standing on a bouncing ball (stressed) compared to when they were not on a bouncing ball (not stressed).
#Insert your R code here (use as many R lines as you need)
stress_data <- read.csv("stress+and+talk.csv")
# paired t-test (Lab 11)
# condition for stress <- [1 = ctrl; 2 = bouncyball]
Di <- stress_data$talk[stress_data$stress == 2] - stress_data$talk[stress_data$stress == 1]
Dibar <- mean(Di)
sd <- sd(Di)
n <- length(Di)
mu <- 0
alpha <- 0.05
t_test <- t.test(Di, alternative = "two.sided", mu = mu, paired = FALSE, conf.level = 0.95)
dof <- t_test$parameter
tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)
print(t_test)
##
## One Sample t-test
##
## data: Di
## t = -1.9902, df = 14, p-value = 0.06647
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -48.063932 1.797265
## sample estimates:
## mean of x
## -23.13333
print(t_crits)
## [1] -2.144787 2.144787
3a) State the null and alternative hypotheses.
\(H_0: \mu_\text{Di} = \mu_\text{Di}\)
\(H_a: \mu_\text{Di} ≠ \mu_\text{Di}\)
3b) What type of test should you use to analyze this data?
3c) Either report (1) the critical value(s), (2) the obtained value, and (3) the p value in a data frame if you did the test “by hand”, or print the t.test() results and write the critical value(s) below.
| T_crit_lower | -2.144787 |
| T_crit_upper | 2.144787 |
| T_obt | -1.9902 |
| P_Value | 0.06647 |
3d) What is your decision in regards to the null and alternative hypotheses?
3e) Interpret your decision in terms of statistical significance
3f) Are you 100% sure about your conclusion? Why? Can you be 95% sure about your conclusion?
No. You can never be 100% sure because statistical tests, which test SAMPLES not populations, never are absolutely certain about the true population distribution.
You also cannot be 95% confident cuz p-value is greater than 0.05, so we did not meet the 95% confidence threshold.
Two random samples, each of 15 subjects, participated in the same study as in question 3. This time, however, one sample read the Reader’s Digest article, while the other sample read the Reader’s Digest article on top of a bouncing ball. The researchers wondered whether the words read per minute differed \(alpha = 0.05\) between participants who were standing on a bouncing ball (stressed) compared to participants who were not standing on a bouncing ball (not stressed).
#Insert your R code here (use as many R lines as you need)
stress <- stress_data$talk[stress_data$stress == 1]
ball <- stress_data$talk[stress_data$stress == 2]
sd_stress <- sd(stress)
sd_ball <- sd(ball)
sd_values <- data.frame(
Values = c("Sd Stress", "Sd Ball"),
sd = c(sd_stress, sd_ball)
)
print(sd_values)
## Values sd
## 1 Sd Stress 24.96912
## 2 Sd Ball 27.13581
fraction <- sd_ball/sd_stress
print(fraction)
## [1] 1.086775
# POOLED T-test
alpha <- 0.05
tt_talk <- t.test(stress,ball,
alternative = "two.sided",
var.equal = TRUE,
paired = FALSE,
conf.level = 0.95)
tobt <- tt_talk$statistic
dof <- tt_talk$parameter
pt <- tt_talk$p.value
ci <- tt_talk$conf.int
tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)
doft <- data.frame(
Variables = c("T_crit_lower", "T_crit_upper", "T_obt", "P_Value"),
Values = c(tcrit_lower, tcrit_upper, tobt, pt)
)
print(tt_talk)
##
## Two Sample t-test
##
## data: stress and ball
## t = 2.4297, df = 28, p-value = 0.02178
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.629962 42.636705
## sample estimates:
## mean of x mean of y
## 45.20000 22.06667
print(doft)
## Variables Values
## 1 T_crit_lower -2.04840714
## 2 T_crit_upper 2.04840714
## 3 T_obt 2.42965610
## 4 P_Value 0.02177969
4a) State the null and alternative hypotheses.
\(H_0: \mu_\text{noball} = \mu_\text{ball}\)
\(H_a: \mu_\text{noball} ≠ \mu_\text{ball}\)
4b) What type of test should you use to analyze this data?
4c) Either report (1) the critical value(s), (2) the obtained value, and (3) the p value in a data frame if you did the test “by hand”, or print both the t.test() results and the critical value(s).
| T_crit_lower | -2.04840714 |
| T_crit_upper | 2.04840714 |
| T_obt | 2.42965610 |
| P_Value | 0.02177969 |
4d) What is your decision in regards to the null and alternative hypotheses?
4e) Interpret your decision in terms of statistical significance
4f) Report your confidence intervals and interpret them.
95% CI: (3.629962, 42.636705)
We’re 95% confident that the TRUE difference in mean reading speed between the two conditions (no stress/stress) lies between ~ 3.6 and 42.6 WPM.
Section 2 - Conclusion-Based Hypothesis Testing *********************************************** Instructions: For each of the following questions, use a single block of code and report your answers either in a data frame (if you are calculating values “by hand”) or by printing the output of the test (if you use an R function to perform the test). Please use “bodytemp_heartrate.csv” for the following questions, and assume \(\alpha = 0.05\).
Do males have significantly lower body temperature compared to females? Report the obtained statistic, the critical value(s), the p-value, and the confidence interval, and interpret your findings.
#Insert your R code here (use as many R lines as you need)
body_data <- read.csv("bodytemp_heartrate.csv")
male <- body_data$temp[body_data$gender == 1]
female <- body_data$temp[body_data$gender == 2]
sd_1 <- sd(male)
sd_2 <- sd(female)
sd_values <- data.frame(
Values = c("Sd Male", "Sd Female"),
sd = c(sd_1, sd_2)
)
print(sd_values)
## Values sd
## 1 Sd Male 0.6987558
## 2 Sd Female 0.7434878
fraction <- sd_2/sd_1
print(fraction)
## [1] 1.064017
# Use POOLED t-test
tt_temp <- t.test(male, female,
alternative = "less",
var.equal = TRUE,
paired = FALSE,
conf.level = 0.95)
dof <- tt_temp$parameter
tcrit <- qt(0.05,dof)
ci <- tt_temp$conf.int
tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)
print(tt_temp)
##
## Two Sample t-test
##
## data: male and female
## t = -2.2854, df = 128, p-value = 0.01197
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.07955046
## sample estimates:
## mean of x mean of y
## 98.10462 98.39385
print(t_crits)
## [1] -1.978671 1.978671
Since the p-value is SMALLER than the alpha-value (0.012 < 0.05), we can reject the null hypothesis. We can conclude based on this sample, males have a significantly LOWER mean body temp than females.
| T_crit_lower | -1.978671 |
| T_crit_upper | 1.978671 |
| T_obt | -2.2854 |
| P_Value | 0.01197 |
| Confidence Interval (95%) | (-Inf, -0.07955046) |
Do males and females differ significantly in heart rate? Report the obtained statistic, the critical value(s), the p-value, and the confidence interval, and interpret your findings.
#Insert your R code here (use as many R lines as you need)
body_data <- read.csv("bodytemp_heartrate.csv")
male <- body_data$hrtrate[body_data$gender == 1]
female <- body_data$hrtrate[body_data$gender == 2]
sd_1 <- sd(male)
sd_2 <- sd(female)
sd_values <- data.frame(
Values = c("Sd Male", "Sd Female"),
sd = c(sd_1, sd_2)
)
print(sd_values)
## Values sd
## 1 Sd Male 5.875184
## 2 Sd Female 8.105227
fraction <- sd_2/sd_1
print(fraction)
## [1] 1.37957
# Use POOLED t-test
tt_hr <- t.test(male, female,
alternative = "two.sided",
var.equal = TRUE,
paired = FALSE,
conf.level = 0.95)
dof <- tt_hr$parameter
tcrit <- qt(0.05,dof)
ci <- tt_hr$conf.int
alpha <- 0.05
tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)
print(tt_hr)
##
## Two Sample t-test
##
## data: male and female
## t = -0.63191, df = 128, p-value = 0.5286
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.241461 1.672230
## sample estimates:
## mean of x mean of y
## 73.36923 74.15385
print(t_crits)
## [1] -1.978671 1.978671
Since the the pvalue is GREATER than the alpha value (0.53 > 0.05), we fail to reject the null hypothesis (H0). Based on this sample, there is NO statistically significant difference in the mean heart rate between males and females.
| T_crit_lower | -1.978671 |
| T_crit_upper | 1.978671 |
| T_obt | -0.63191 |
| P_Value | 0.5286 |
| Confidence Interval (95%) | (-3.241461, 1.672230) |
Perform a simple linear regression to explain the effect of heart rate on body temperature.
#Insert your R code here (use as many R lines as you need)
temp <- body_data$temp
hr <- body_data$hrtrate
linear_regression <- lm(temp~hr, data = body_data)
summary <- summary(linear_regression)
print(summary)
##
## Call:
## lm(formula = temp ~ hr, data = body_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85017 -0.39999 0.01033 0.43915 2.46549
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 96.306754 0.657703 146.429 < 2e-16 ***
## hr 0.026335 0.008876 2.967 0.00359 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.712 on 128 degrees of freedom
## Multiple R-squared: 0.06434, Adjusted R-squared: 0.05703
## F-statistic: 8.802 on 1 and 128 DF, p-value: 0.003591
b1 <- 0.026335
se_b1 <- 0.008876
dof <- 128
alpha <- 0.05
tcrit <- qt(1-alpha/2,dof)
moe <- tcrit*se_b1
lowci <- b1 - moe
highci <- b1 + moe
ci <- c(lowci, highci)
print(ci)
## [1] 0.008772318 0.043897682
7a) Report the p value using the summary() function.
7b) Explain the relationship using the values of the intercept and the independent variable.
For every 1 BPM increase in heart rate (HR), body temperature increases by an average of 0.0263 F
\(Body Temp = 96.31 + 0.026335*(Heart Rate)\)
7c) How much variation in body temperature can be explained by heart rate? What statistical variable tells us this?
6.434% of the variation in body temp may be explained by HR
“Multiple R-squared” (which equals 0.06434 tells us this)
7d) Assume this is a two independent samples t test. Calculate and report the 95 percent confidence interval.