Question 1: (30 points) A medical researcher conjectures that the likelihood of having wrinkled skin around the eyes increases when a person smokes. The smoking habits as well as the presence of prominent wrinkles around the eyes were recorded for 500 randomly selected people from the population of interest. The following frequency table is obtained:

prom_wrinkles <- c(95,75,66)
not_prom_wrinkles <- c(55,75,134)
df <- data.frame(prom_wrinkles, not_prom_wrinkles)
rownames(df) <- c("heavy_smoker","light_smoker","non_smoker")
df
  1. Conduct a test to find out if someone’s smoking habits are associated with the presence of skin wrinkles. Use alpha= 0.05. You must follow these steps to conduct this test: a1) State the hypotheses (Ho and Ha). Ho = the amount of smoking and how prominent wrinkles are are INDEPENDENT of one another Ha = the amount of smoking and how prominent wrinkles are are DEPENDENT of one another
chisq.test(prom_wrinkles, not_prom_wrinkles)
## Warning in chisq.test(prom_wrinkles, not_prom_wrinkles): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  prom_wrinkles and not_prom_wrinkles
## X-squared = 6, df = 4, p-value = 0.1991

a2) Whether you reject or fail to reject Ho and why. 0.1991 > 0.05. Fail to reject. This is because the P-value is more than the alpha value.

a3) Your conclousion (i.e., whether smoking is associated with having wrinkles). The idea that “the amount of smoking and how prominent wrinkles are are INDEPENDENT of one another” will not be rejected while no evidence has been found to support “the amount of smoking and how prominent wrinkles are are DEPENDENT of one another”.


Question 2: (40 points) A researcher wants to compare the average anxiety levels of people living in Alaska and Hawaii. The researcher does not have any specific hypothesis in mind in terms of which state could have higher mean anxiety levels. She collected data on anxiety scores for two samples of randomly selected residents from both states. Each resident was given a score between 0 to 100 (higher scores mean more anxiety).

The anxiety scores she collected for each state are shown next. Create two vectors in R with these data (one vector for Alaska scores and another one for Hawaii scores).

alaska = c(69, 76, 64, 65, 67, 77, 56, 67, 62, 82, 56, 77, 71, 68, 76, 69, 64, 66, 83, 77, 75, 79, 71, 75, 86, 67, 70, 73, 77, 71, 78, 64, 62, 58, 67)
alaska
##  [1] 69 76 64 65 67 77 56 67 62 82 56 77 71 68 76 69 64 66 83 77 75 79 71 75 86
## [26] 67 70 73 77 71 78 64 62 58 67
hawaii = c(64, 76, 74, 74, 73, 71, 75, 63, 67, 77, 74, 67, 69, 70, 64, 72, 72, 72, 74, 76, 67, 69, 80, 73, 68, 77, 71, 73, 69, 68, 71, 71, 73, 75, 71)
hawaii
##  [1] 64 76 74 74 73 71 75 63 67 77 74 67 69 70 64 72 72 72 74 76 67 69 80 73 68
## [26] 77 71 73 69 68 71 71 73 75 71

Assume that the variables involved in this problem follow a normal distribution. Also assume their variances can be safely considered to be the same (in other words, you do NOT need to do the test to compare two variances here. Assume that the variances are equal).

  1. State the hypotheses (Ho and Ha) that this researcher should set up to conduct the test that will allow her to make the desired comparison. Ho: Mean anxiety scores in Alaska = Mean anxiety scores in Hawaii Ha: Mean anxiety scores in Alaska > Mean anxiety scores in Hawaii

  2. Make a decision using a significance level of 0.05. Justify your decision.

cor.test(hawaii, alaska)
## 
##  Pearson's product-moment correlation
## 
## data:  hawaii and alaska
## t = -0.84753, df = 33, p-value = 0.4028
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4569754  0.1968662
## sample estimates:
##        cor 
## -0.1459558
  1. Obtain a 95% confidence interval for the difference between the mean anxiety level in Alaska and the mean anxiety level in Hawaii. Does the interval lead you to the same conclusion you reached from the hypothesis test? Justify.
t.test (hawaii, alaska, var.equal = TRUE, conf.level = 0.95)
## 
##  Two Sample t-test
## 
## data:  hawaii and alaska
## t = 0.70165, df = 68, p-value = 0.4853
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.843955  3.843955
## sample estimates:
## mean of x mean of y 
##  71.42857  70.42857

Question 3: (15 points) Consider a research study where the goal is to test whether nightly melatonin supplementation improves sleep (i.e., if it increases the amount of sleep time). The authors report the following result:

“In comparison with placebo, a 3 weeks of melatonin supplementation significantly increased the average sleep time (amount of increase= 36 min; P value = 0.046).”

Answer the following questions:

  1. What kind of hypothesis test was conducted by these authors? A test to compare two means? A test to compare two variances? A chi-square test to test for independence? Choose the correct option among these three and justify.

A test to compare two means was conducted by the authors. They were dealing with numerical values. They took a sample from a placebo population and a melatonin supplement taking population and obtained the mean (or average) for both groups. After than they determined the difference statistically (“amount of increase” and “P value)

  1. State the hypotheses (Ho and Ha) that the authors were testing in this case. You need to state both Ho and Ha.

Ho: The mean of amount of sleep attained by the melatonin supplimentation population is EQUAL the mean of amount of sleep attained by the placebo population. Ha: The mean of amount of sleep attained by the melatonin supplimentation population is GREATER THAN the mean of amount of sleep attained by the placebo population.

  1. Did the authors find evidence to support the alternative hypothesis? Justify. The P-value was only 0.046. Any P-value that is lower than 0.05 is generally considered statistically significant. With this, we will reject the null hypothesis in favor of the alternative hypothesis. Using melatonin supplimentation helps with the amount of sleep you get.

Question 4: (15 points) The results after rolling a die 300 times are shown in the next table:

Is there sufficient evidence to conclude that a loaded die was used in this experiment? Use a significance level of 0.05. Note: A normal (not loaded) die is one with equal probability for all the faces of the die.

df2 <- data.frame(p_1 = c(45),
                  p_2 = c(52),
                  p_3 = c(50),
                  p_4 = c(58),
                  p_5 = c(55),
                  p_6 = c(40))
rownames(df2) <- c("frequency")
df2

Is there sufficient evidence to conclude that a loaded die was used in this experiment? Use a significance level of 0.05.

Ho: p_1= 50/600 p_2= 50/600 p_3= 50/600 p_4= 50/600 p_5= 50/600 p_6= 50/600 (the die is normal)

Ha: At least one of probabilities (or percentages) are different from what we expect (the wheel is weighted)

observed = c(50,50,50,50,50,50)
expected = c(45,52,50,58,55,40)
x2 <- ((observed - expected)^2)/expected
x2
## [1] 0.55555556 0.07692308 0.00000000 1.10344828 0.45454545 2.50000000
#chisq.test (observed, p = x2).