5.5 Working backwards, Part I.

A 95% confidence interval for a population mean, μ, is given as (18.985, 21.015). This confidence interval is based on a simple random sample of 36 observations. Calculate the sample mean and standard deviation. Assume that all conditions necessary for inference are satisfied. Use the t-distribution in any calculations

Ans: we know

confidence = sample_mean ± t* . SE SE = SD/√n

The sample mean = 20 margin of error = 1.015 The standard deviation = 3.6

n <- 36
df <- n-1
t_stat <- qt(0.95, df)

confidence_lower <- 18.985
confidence_upper <- 21.015
sample_mean <- (confidence_lower +  confidence_upper)/2
sample_mean
## [1] 20
margin_of_err <- confidence_upper - sample_mean
margin_of_err
## [1] 1.015
SE <- margin_of_err/t_stat
SD <- SE * sqrt(n)

SD
## [1] 3.604462

5.13 Car insurance savings.

A market researcher wants to evaluate car insurance savings at a competing company. Based on past studies he is assuming that the standard deviation of savings is $100. He wants to collect data such that he can get a margin of error of no more than $10 at a 95% confidence level. How large of a sample should he collect?

ME = z* . SE ME = z* . SD/sqrt(n)

SD <- 100
ME <- 10
zstar <- qnorm(0.025, lower.tail=F)
zstar
## [1] 1.959964
n = (SD/ME * zstar)^2
n
## [1] 384.1459

5.19 Global warming, Part I.

Is there strong evidence of global warming? Let’s consider a small scale example, comparing how temperatures have changed in the US from 1968 to 2008. The daily high temperature reading on January 1 was collected in 1968 and 2008 for 51 randomly selected locations in the continental US. Then the di↵erence between the two readings (temperature in 2008 - temperature in 1968) was calculated for each of the 51 di↵erent locations. The average of these 51 values was 1.1 degrees with a standard deviation of 4.9 degrees. We are interested in determining whether these data provide strong evidence of temperature warming in the continental US.

  1. Is there a relationship between the observations collected in 1968 and 2008? Or are the observations in the two groups independent? Explain.

Ans: Assuming that the samples are collected randomnly and 51 locations are less than 10% of all locations, the observations are independent.

  1. Write hypotheses for this research in symbols and in words.

Ans:

H0 - null Hypothesis : mean = 0 H1 - Alternate Hythosis : mean <> 0

  1. Check the conditions required to complete this test.

Ans:

  1. observations are independent. See above for qn.a
  2. Samples are greater than 30 for a moderate skewed distribution
  3. Not sure if the distrubition has a strong skewness. Assuming that it is not. If it is, we need more samples preferabily.
  1. Calculate the test statistic and find the p-value.
SD <- 4.9
SE <- SD/sqrt(51)
t_stat  <- (1.1 - 0)/SE
t_stat
## [1] 1.603178
pvalue <- 1 - pt(t_stat, df=50)
pvalue
## [1] 0.05759731
  1. What do you conclude? Interpret your conclusion in context.

Since the p-val is greater than the 0.05, we fail to reject the null hypothesis..

  1. What type of error might we have made? Explain in context what the error means.

Type 2 Error. Since I fail to reject the null hypothesis.

  1. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the temperature measurements from 1968 and 2008 to include 0? Explain your reasoning.

Yes, It is. If 0 is outside the confidence range, we would have rejected the null hypothesis.

5.31 Chicken diet and weight, Part I.

Chicken farming is a multi-billion dollar industry,and any methods that increase the growth rate of young chicks can reduce consumer costs while increasing company profits, possibly by millions of dollars. An experiment was conducted to measure and compare the e↵ectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into six groups, and each group was given a di↵erent feed supplement. Below are some summary statistics from this data set along with box plots showing the distribution of weights by feed type

  1. Describe the distributions of weights of chickens that were fed linseed and horsebean.

Ans: Based on the box plot, the linseed is symetrically distributed. But the horsebean has aslight right skeweness.

  1. Do these data provide strong evidence that the average weights of chickens that were fed linseed and horsebean are di↵erent? Use a 5% significance level.

Ans : for paired analysis, we have to use SE and T-score as below.

SE = sqrt(sd1^2/n1 + sd2^2/n2)

mean1 - mean2 = mean_diff convert into tscore instead of zscore.

T-score = (mean_diff - 0)/SE

horsebean_mean <- 160.20 
horsebean_sd <- 38.63 
horsebean_samples <- 10


linseed_mean <- 218.75
linseed_sd <-  52.24
linseed_samples <-  12

mean_diff <- linseed_mean - horsebean_mean
df <- linseed_samples + horsebean_samples - 2
pooled_var <-(linseed_sd^2 * linseed_samples -1 + horsebean_sd^2 * horsebean_samples-1) / df
se_diff <- sqrt(pooled_var/(linseed_samples-1) + pooled_var/(horsebean_samples-1))
t_val <- mean_diff / se_diff
t_val
## [1] 2.66825
pval <- 1-pt(2.807368, df= df)

pval
## [1] 0.005439155

Since the p-val is greater than .05, we can reject the null hypothesis.

  1. What type of error might we have committed? Explain.

Type 1 Error since we are rejecting the null hypothesis.

  1. Would your conclusion change if we used alpha = 0.01?

If we use alpha = 0.01, The critical t is 2.85 which is greater than t statistics and P-value is also greater than alpha/2 = 0.005. We fail to reject

5.45 Coffee, depression, and physical activity.

Caffeine is the world’s most widely used stimulant, with approximately 80% consumed in the form of co↵ee. Participants in a study investigating the relationship between co↵ee consumption and exercise were asked to report the number of hours they spent per week on moderate (e.g., brisk walking) and vigorous (e.g., strenuous sports and jogging) exercise. Based on these data the researchers estimated the total hours of metabolic equivalent tasks (MET) per week, a value always greater than 0. The table below gives summary statistics of MET for women in this study based on the amount of co↵ee consumed.46

  1. Write the hypotheses for evaluating if the average physical activity level varies among the different levels of co↵ee consumption.

H0 - Null hypothesis -> average physical activity level is same for all levels of coffee consumption. H1 - alternate Hypothesis -> average physical activity level is different for all levels of coffee consumption.

  1. Check conditions and describe any assumptions you must make to proceed with the test.

This is anova test, So we have 3 conditions to check.

  1. independence check( with in group and acorss the group) : True
  2. normality check( check for skewness): Since the number of samples are more in each group, we can assume it is normal.
  3. Constant variance. The last assumption is that the variance in the groups is about equal from one group to the next Since the variance is more or less same for all class, we can consider this condition is true.
  1. Below is part of the output associated with this test. Fill in the empty cells.
Question

Question

Reference from the book.

Reference from book

Reference from book

Ans:

Df_coffee : 4 Df_res : 50735

groups <- 5
Df_G <- groups-1
Df_G
## [1] 4
totalSamples <- 50739
Df_R <- totalSamples - Df_G
Df_R
## [1] 50735

SSE : 10508

SST <- 25575327
SSE <- 25564819

SSG <- SST - SSE
SSG
## [1] 10508

MSG = 2627 MSE = 503.8892 fvalue = 5.213448 pvalue = 0.0003391195

Since it is less than 0.05, we can reject null hypothesis.

Lets do an modified significance value

Modified significance level ( Alpha) = significance_level/K K= k(k-1)/2

Here also, we can reject null hypothesis. ( Modified alpha being .005 which is greater than then pvalue)

MSG <- SSG/Df_G
MSG
## [1] 2627
MSE <- SSE/Df_R
MSE
## [1] 503.8892
fvalue <- MSG/MSE
fvalue
## [1] 5.213448
pf(fvalue, Df_G, Df_R, lower.tail = F)
## [1] 0.0003391195
K <- 5 * 4/2
alpha_modified <- 0.05/K
alpha_modified
## [1] 0.005