Importing Data **************

These lines of code import your data. Please make sure that the csv files are stored in the same file as this R Markdown file.

body_data <- read.csv("bodytemp_heartrate.csv")
stress_data <- read.csv("stress+and+talk.csv")

Section 1 - Detailed Hypothesis Testing ***************************************

Instructions: Please answer all of the subpoints of each of the questions. For critical values, if there are two, please report both. Any data that you calculate should be reported in a data frame that you print at the bottom of your block of code. For written answers, please write them below the subpoint in the R code.

Question 1: *********(10 points)

Suppose that you are a zoologist studying the heights of giraffes. You know that the population mean for giraffe height is 550 cm with a standard deviation of 30 cm. On your trip to Tanzania, you use advanced technology to collect the height for as many Masai giraffes as you can find. After sampling 25 Masai giraffes, you find that their average height is 537 cm. You think that this is a low number, but you are interested in whether the Masai giraffes are significantly shorter (\(\alpha = 0.05\)) than the population average.

#Insert your R code here (use as many R lines as you need)


## 1b).

mu <- 550
sigma <- 30
alpha <- 0.05
n <- 25

xbar <- 537
zcrit <- qnorm(alpha)
zobt <- (xbar-mu)/(sigma/sqrt(n))

p <- pnorm(zobt, lower.tail = TRUE)

dofz <- data.frame(
  Variables = c("Z-crit", "Z-obt", "P-Value"),
  Values = c(zcrit, zobt, p)
)

print(dofz)

##   Variables      Values
## 1    Z-crit -1.64485363
## 2     Z-obt -2.16666667
## 3   P-Value  0.01513014

1a) Write out the null and alternative hypotheses

H0: mu = 550
Ha: mu < 550

1b) Within a data frame in R, report the following values: Report the appropriate critical value(s) Report the calculated statistic What is the probability of finding a Masai giraffe of 537 cm within the population?

Probability of finding a Masai giraffe of 537 cm within the population is 0.01513.

1c) What is your decision in regards to the null and alternative hypotheses?

Since p-value < alpha (.0151 < 0.05), we REJECT the null hypothesis (H0).

1d) Interpret your decision in terms of statistical significance

Since the p-value (0.0151) is less than the alpha value (0.05), the result is statistically significant. Essentially, there is enough proof to conclude that the heights of Masai giraffes in this sample are significantly shorter than the known population mean of 550 cm.

Question 2: *********** (10 points)

A professor is skeptical of whether the lighting in the class room helps or hurts students’ grades. He wants to compare the exam scores of his current class that has dim lights to his past classes that had bright lights. Since he has kept track of every single mean exam score over his 10 years of teaching the course (where the exam stayed exactly the same), he knows that the mean score on the exam is a 76.8 out of 100. However, he only kept track of the mean. A group of 30 students received the following scores on the exam. The professor is wondering, is there any difference between these scores compared to the population mean at an alpha of 0.01?

(64.0, 79.5, 74.6, 76.8, 72.0, 65.9, 95.9, 68.9, 98.6, 65.6, 62.6, 75.9, 81.8, 95.2, 68.9, 96.8, 70.1, 93.2, 84.3, 77.3, 66.1, 83.4, 61.4, 73.2, 70.1, 71.9, 79.1, 70.3, 92.4, 78.8)

#Insert your R code here (use as many R lines as you need)

examscore <- c(64.0, 79.5, 74.6, 76.8, 72.0, 65.9, 95.9, 68.9, 98.6, 65.6, 62.6, 75.9, 81.8, 95.2, 68.9, 96.8, 70.1, 93.2, 84.3, 77.3, 66.1, 83.4, 61.4, 73.2, 70.1, 71.9, 79.1, 70.3, 92.4, 78.8)


mu <- 76.8
n <- length(examscore)
alpha <- 0.01



dof <- n-1
xbar <- mean(examscore)
s <- sd(examscore)
se <- s/sqrt(n)
tobt <- (xbar-mu)/se

t_test <- t.test(examscore, alternative = "two.sided", mu = mu, paired = FALSE, conf.level = 0.99)

tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
pt <- pt(tobt, dof, lower.tail = FALSE)*2 
# pt doesnt exactly do 2 tailed tests?? 


doft <- data.frame(
  Variables = c("T_crit_lower", "T_crit_upper", "T_obt", "P_Value"),
  Values = c(tcrit_lower, tcrit_upper, tobt, pt)
)

print(doft)

##      Variables     Values
## 1 T_crit_lower -2.7563859
## 2 T_crit_upper  2.7563859
## 3        T_obt  0.1758421
## 4      P_Value  0.8616402

2a) Write out the null and alternative hypotheses

H0: mu = 76.8
Ha: mu =/=76.8

2b) Within a data frame in R, report the following values:

Report the appropriate critical value(s) (hint, how many tails are in this test?)

Report the calculated statistic.

Report the p value.

T_crit_lower	-2.7563859
T_crit_upper	2.7563859
T_obt	0.1758421
P_Value	0.8616402

2c) What is your decision in regards to the null and alternative hypotheses?

Since the p-value is > alpha (0.01), we cannot reject the null hypothesis.

2d) Interpret your decision in terms of statistical significance.

Since the p-value (0.86) is larger than the alpha-value (0.01), there is NOT a statistically significant difference in the exam scores of this sample of scores when compared to the population mean score. Essentially, based on the sample, dim classroom lighting does NOT seem to have a statistically detectable effect on exam scores.

Please use “stress+and+talk.csv” for both questions 3 and 4, as they are two different questions about the same data.

Question 3 ********** (15 points)

A random sample of 15 people participated in a study, which tried to investigate the relationship between stress level and how fast a person can read an article. The study was divided into two parts.

In the first part (1), every subject was asked to read aloud an article from Readers’ Digest, and they were told that they will be asked a question about the article.

In the second part (2) of the study, everything else remained the same, but those same subjects then stood on a balance ball while reading the article, which invoked stress in the participant.

The researchers wondered whether the words read per minute differed (alpha = 0.05) when participants were standing on a bouncing ball (stressed) compared to when they were not on a bouncing ball (not stressed).

#Insert your R code here (use as many R lines as you need)

stress_data <- read.csv("stress+and+talk.csv")

# paired t-test (Lab 11)

# condition for stress <- [1 = ctrl; 2 = bouncyball]
 
Di <- stress_data$talk[stress_data$stress == 2] - stress_data$talk[stress_data$stress == 1]

Dibar <- mean(Di)
sd <- sd(Di)
n <- length(Di)
mu <- 0

alpha <- 0.05

t_test <- t.test(Di, alternative = "two.sided", mu = mu, paired = FALSE, conf.level = 0.95)
dof <- t_test$parameter

tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)

print(t_test)

## 
##  One Sample t-test
## 
## data:  Di
## t = -1.9902, df = 14, p-value = 0.06647
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -48.063932   1.797265
## sample estimates:
## mean of x 
## -23.13333

print(t_crits)

## [1] -2.144787  2.144787

3a) State the null and alternative hypotheses.

\(H_0: \mu_\text{Di} = \mu_\text{Di}\)
\(H_a: \mu_\text{Di} ≠ \mu_\text{Di}\)

3b) What type of test should you use to analyze this data?

A paired-samples t-test or 1 sample independent t-test on the differences between the values.

3c) Either report (1) the critical value(s), (2) the obtained value, and (3) the p value in a data frame if you did the test “by hand”, or print the t.test() results and write the critical value(s) below.

T_crit_lower	-2.144787
T_crit_upper	2.144787
T_obt	-1.9902
P_Value	0.06647

3d) What is your decision in regards to the null and alternative hypotheses?

Since the p-value (0.06647) is GREATER than the alpha value of 0.05, we cannot reject the null hypothesis (H0)

3e) Interpret your decision in terms of statistical significance

This result is NOT statistically significant at alpha = 0.05 since p > alpha. We DONT have enough evidence to conclude that reading speed when under stress differs from reading speed without stress.

3f) Are you 100% sure about your conclusion? Why? Can you be 95% sure about your conclusion?

No. You can never be 100% sure because statistical tests, which test SAMPLES not populations, never are absolutely certain about the true population distribution.
You also cannot be 95% confident cuz p-value is greater than 0.05, so we did not meet the 95% confidence threshold.

Question 4 ********** (15 points)

Two random samples, each of 15 subjects, participated in the same study as in question 3. This time, however, one sample read the Reader’s Digest article, while the other sample read the Reader’s Digest article on top of a bouncing ball. The researchers wondered whether the words read per minute differed \(alpha = 0.05\) between participants who were standing on a bouncing ball (stressed) compared to participants who were not standing on a bouncing ball (not stressed).

#Insert your R code here (use as many R lines as you need)

stress <- stress_data$talk[stress_data$stress == 1]
ball <- stress_data$talk[stress_data$stress == 2]

sd_stress <- sd(stress)
sd_ball <- sd(ball)

sd_values <- data.frame(
  Values = c("Sd Stress", "Sd Ball"),
  sd = c(sd_stress, sd_ball)
)
print(sd_values)

##      Values       sd
## 1 Sd Stress 24.96912
## 2   Sd Ball 27.13581

fraction <- sd_ball/sd_stress

print(fraction)

## [1] 1.086775

# POOLED T-test

alpha <- 0.05
tt_talk <- t.test(stress,ball,
                  alternative = "two.sided",
                  var.equal = TRUE,
                  paired = FALSE,
                  conf.level = 0.95)

tobt <- tt_talk$statistic
dof <- tt_talk$parameter
pt <- tt_talk$p.value
ci <- tt_talk$conf.int

tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)


doft <- data.frame(
  Variables = c("T_crit_lower", "T_crit_upper", "T_obt", "P_Value"),
  Values = c(tcrit_lower, tcrit_upper, tobt, pt)
)

print(tt_talk)

## 
##  Two Sample t-test
## 
## data:  stress and ball
## t = 2.4297, df = 28, p-value = 0.02178
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3.629962 42.636705
## sample estimates:
## mean of x mean of y 
##  45.20000  22.06667

print(doft)

##      Variables      Values
## 1 T_crit_lower -2.04840714
## 2 T_crit_upper  2.04840714
## 3        T_obt  2.42965610
## 4      P_Value  0.02177969

4a) State the null and alternative hypotheses.

\(H_0: \mu_\text{noball} = \mu_\text{ball}\)
\(H_a: \mu_\text{noball} ≠ \mu_\text{ball}\)

4b) What type of test should you use to analyze this data?

Pooled t-test (equal variance)

4c) Either report (1) the critical value(s), (2) the obtained value, and (3) the p value in a data frame if you did the test “by hand”, or print both the t.test() results and the critical value(s).

T_crit_lower	-2.04840714
T_crit_upper	2.04840714
T_obt	2.42965610
P_Value	0.02177969

4d) What is your decision in regards to the null and alternative hypotheses?

Since p-value is LESS than alpha (0.0218 < 0.05), we REJECT the null hypothesis (H0).

4e) Interpret your decision in terms of statistical significance

At 5% significance, there IS a significant difference in reading speed between subjects on the the bouncy ball (non-stressed) and subjects NOT on the bouncy ball (stressed)

4f) Report your confidence intervals and interpret them.

95% CI: (3.629962, 42.636705)
We’re 95% confident that the TRUE difference in mean reading speed between the two conditions (no stress/stress) lies between ~ 3.6 and 42.6 WPM.

Section 2 - Conclusion-Based Hypothesis Testing *********************************************** Instructions: For each of the following questions, use a single block of code and report your answers either in a data frame (if you are calculating values “by hand”) or by printing the output of the test (if you use an R function to perform the test). Please use “bodytemp_heartrate.csv” for the following questions, and assume \(\alpha = 0.05\).

Question 5 ********** (5 points)

Do males have significantly lower body temperature compared to females? Report the obtained statistic, the critical value(s), the p-value, and the confidence interval, and interpret your findings.

#Insert your R code here (use as many R lines as you need)

body_data <- read.csv("bodytemp_heartrate.csv")

male <- body_data$temp[body_data$gender == 1]
female <- body_data$temp[body_data$gender == 2]

sd_1 <- sd(male)
sd_2 <- sd(female)

sd_values <- data.frame(
  Values = c("Sd Male", "Sd Female"),
  sd = c(sd_1, sd_2)
)
print(sd_values)

##      Values        sd
## 1   Sd Male 0.6987558
## 2 Sd Female 0.7434878

fraction <- sd_2/sd_1
print(fraction)

## [1] 1.064017

# Use POOLED t-test

tt_temp <- t.test(male, female,
                  alternative = "less",
                  var.equal = TRUE,
                  paired = FALSE,
                  conf.level = 0.95)
dof <- tt_temp$parameter
tcrit <- qt(0.05,dof)
ci <- tt_temp$conf.int

tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)

print(tt_temp)

## 
##  Two Sample t-test
## 
## data:  male and female
## t = -2.2854, df = 128, p-value = 0.01197
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##         -Inf -0.07955046
## sample estimates:
## mean of x mean of y 
##  98.10462  98.39385

print(t_crits)

## [1] -1.978671  1.978671

Since the p-value is SMALLER than the alpha-value (0.012 < 0.05), we can reject the null hypothesis. We can conclude based on this sample, males have a significantly LOWER mean body temp than females.

T_crit_lower	-1.978671
T_crit_upper	1.978671
T_obt	-2.2854
P_Value	0.01197
Confidence Interval (95%)	(-Inf, -0.07955046)

Question 6 ********** (5 points)

Do males and females differ significantly in heart rate? Report the obtained statistic, the critical value(s), the p-value, and the confidence interval, and interpret your findings.

#Insert your R code here (use as many R lines as you need)

body_data <- read.csv("bodytemp_heartrate.csv")

male <- body_data$hrtrate[body_data$gender == 1]
female <- body_data$hrtrate[body_data$gender == 2]

sd_1 <- sd(male)
sd_2 <- sd(female)

sd_values <- data.frame(
  Values = c("Sd Male", "Sd Female"),
  sd = c(sd_1, sd_2)
)
print(sd_values)

##      Values       sd
## 1   Sd Male 5.875184
## 2 Sd Female 8.105227

fraction <- sd_2/sd_1
print(fraction)

## [1] 1.37957

# Use POOLED t-test

tt_hr <- t.test(male, female,
                  alternative = "two.sided",
                  var.equal = TRUE,
                  paired = FALSE,
                  conf.level = 0.95)
dof <- tt_hr$parameter
tcrit <- qt(0.05,dof)
ci <- tt_hr$conf.int
alpha <- 0.05

tcrit_lower <- qt(alpha/2,dof)
tcrit_upper <- qt(1-alpha/2,dof)
t_crits <- c(tcrit_lower, tcrit_upper)

print(tt_hr)

## 
##  Two Sample t-test
## 
## data:  male and female
## t = -0.63191, df = 128, p-value = 0.5286
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.241461  1.672230
## sample estimates:
## mean of x mean of y 
##  73.36923  74.15385

print(t_crits)

## [1] -1.978671  1.978671

Since the the pvalue is GREATER than the alpha value (0.53 > 0.05), we fail to reject the null hypothesis (H0). Based on this sample, there is NO statistically significant difference in the mean heart rate between males and females.

T_crit_lower	-1.978671
T_crit_upper	1.978671
T_obt	-0.63191
P_Value	0.5286
Confidence Interval (95%)	(-3.241461, 1.672230)

Question 7 ********** (15 points)

Perform a simple linear regression to explain the effect of heart rate on body temperature.

#Insert your R code here (use as many R lines as you need)

temp <- body_data$temp
hr <- body_data$hrtrate

linear_regression <- lm(temp~hr, data = body_data)
summary <- summary(linear_regression)


print(summary)

## 
## Call:
## lm(formula = temp ~ hr, data = body_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85017 -0.39999  0.01033  0.43915  2.46549 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 96.306754   0.657703 146.429  < 2e-16 ***
## hr           0.026335   0.008876   2.967  0.00359 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.712 on 128 degrees of freedom
## Multiple R-squared:  0.06434,    Adjusted R-squared:  0.05703 
## F-statistic: 8.802 on 1 and 128 DF,  p-value: 0.003591

b1 <- 0.026335 
se_b1 <- 0.008876
dof <- 128

alpha <- 0.05
tcrit <- qt(1-alpha/2,dof)


moe <- tcrit*se_b1

lowci <- b1 - moe
highci <- b1 + moe

ci <- c(lowci, highci)
print(ci)

## [1] 0.008772318 0.043897682

7a) Report the p value using the summary() function.

p-value (slope; bi)= 0.00359

7b) Explain the relationship using the values of the intercept and the independent variable.

For every 1 BPM increase in heart rate (HR), body temperature increases by an average of 0.0263 F
\(Body Temp = 96.31 + 0.026335*(Heart Rate)\)

7c) How much variation in body temperature can be explained by heart rate? What statistical variable tells us this?

6.434% of the variation in body temp may be explained by HR
“Multiple R-squared” (which equals 0.06434 tells us this)

7d) Assume this is a two independent samples t test. Calculate and report the 95 percent confidence interval.

95% CI: (0.008772318, 0.043897682)

Computer Project 2

Arya Park