A 95% confidence interval for a population mean, μ, is given as (18.985, 21.015). This confidence interval is based on a simple random sample of 36 observations. Calculate the sample mean and standard deviation. Assume that all conditions necessary for inference are satisfied. Use the t-distribution in any calculations
Ans: we know
confidence = sample_mean ± t* . SE SE = SD/√n
The sample mean = 20 margin of error = 1.015 The standard deviation = 3.6
n <- 36
df <- n-1
t_stat <- qt(0.95, df)
confidence_lower <- 18.985
confidence_upper <- 21.015
sample_mean <- (confidence_lower + confidence_upper)/2
sample_mean
## [1] 20
margin_of_err <- confidence_upper - sample_mean
margin_of_err
## [1] 1.015
SE <- margin_of_err/t_stat
SD <- SE * sqrt(n)
SD
## [1] 3.604462
A market researcher wants to evaluate car insurance savings at a competing company. Based on past studies he is assuming that the standard deviation of savings is $100. He wants to collect data such that he can get a margin of error of no more than $10 at a 95% confidence level. How large of a sample should he collect?
ME = z* . SE ME = z* . SD/sqrt(n)
SD <- 100
ME <- 10
zstar <- qnorm(0.025, lower.tail=F)
zstar
## [1] 1.959964
n = (SD/ME * zstar)^2
n
## [1] 384.1459
Is there strong evidence of global warming? Let’s consider a small scale example, comparing how temperatures have changed in the US from 1968 to 2008. The daily high temperature reading on January 1 was collected in 1968 and 2008 for 51 randomly selected locations in the continental US. Then the di↵erence between the two readings (temperature in 2008 - temperature in 1968) was calculated for each of the 51 di↵erent locations. The average of these 51 values was 1.1 degrees with a standard deviation of 4.9 degrees. We are interested in determining whether these data provide strong evidence of temperature warming in the continental US.
Ans: Assuming that the samples are collected randomnly and 51 locations are less than 10% of all locations, the observations are independent.
Ans:
H0 - null Hypothesis : mean = 0 H1 - Alternate Hythosis : mean <> 0
Ans:
SD <- 4.9
SE <- SD/sqrt(51)
t_stat <- (1.1 - 0)/SE
t_stat
## [1] 1.603178
pvalue <- 1 - pt(t_stat, df=50)
pvalue
## [1] 0.05759731
Since the p-val is greater than the 0.05, we fail to reject the null hypothesis..
Type 2 Error. Since I fail to reject the null hypothesis.
Yes, It is. If 0 is outside the confidence range, we would have rejected the null hypothesis.
Chicken farming is a multi-billion dollar industry,and any methods that increase the growth rate of young chicks can reduce consumer costs while increasing company profits, possibly by millions of dollars. An experiment was conducted to measure and compare the e↵ectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into six groups, and each group was given a di↵erent feed supplement. Below are some summary statistics from this data set along with box plots showing the distribution of weights by feed type
Ans: Based on the box plot, the linseed is symetrically distributed. But the horsebean has aslight right skeweness.
Ans : for paired analysis, we have to use SE and T-score as below.
SE = sqrt(sd1^2/n1 + sd2^2/n2)
mean1 - mean2 = mean_diff convert into tscore instead of zscore.
T-score = (mean_diff - 0)/SE
horsebean_mean <- 160.20
horsebean_sd <- 38.63
horsebean_samples <- 10
linseed_mean <- 218.75
linseed_sd <- 52.24
linseed_samples <- 12
mean_diff <- linseed_mean - horsebean_mean
df <- linseed_samples + horsebean_samples - 2
pooled_var <-(linseed_sd^2 * linseed_samples -1 + horsebean_sd^2 * horsebean_samples-1) / df
se_diff <- sqrt(pooled_var/(linseed_samples-1) + pooled_var/(horsebean_samples-1))
t_val <- mean_diff / se_diff
t_val
## [1] 2.66825
pval <- 1-pt(2.807368, df= df)
pval
## [1] 0.005439155
Since the p-val is greater than .05, we can reject the null hypothesis.
Type 1 Error since we are rejecting the null hypothesis.
If we use alpha = 0.01, The critical t is 2.85 which is greater than t statistics and P-value is also greater than alpha/2 = 0.005. We fail to reject
Caffeine is the world’s most widely used stimulant, with approximately 80% consumed in the form of co↵ee. Participants in a study investigating the relationship between co↵ee consumption and exercise were asked to report the number of hours they spent per week on moderate (e.g., brisk walking) and vigorous (e.g., strenuous sports and jogging) exercise. Based on these data the researchers estimated the total hours of metabolic equivalent tasks (MET) per week, a value always greater than 0. The table below gives summary statistics of MET for women in this study based on the amount of co↵ee consumed.46
H0 - Null hypothesis -> average physical activity level is same for all levels of coffee consumption. H1 - alternate Hypothesis -> average physical activity level is different for all levels of coffee consumption.
This is anova test, So we have 3 conditions to check.
Question
Reference from book
Ans:
Df_coffee : 4 Df_res : 50735
groups <- 5
Df_G <- groups-1
Df_G
## [1] 4
totalSamples <- 50739
Df_R <- totalSamples - Df_G
Df_R
## [1] 50735
SSE : 10508
SST <- 25575327
SSE <- 25564819
SSG <- SST - SSE
SSG
## [1] 10508
MSG = 2627 MSE = 503.8892 fvalue = 5.213448 pvalue = 0.0003391195
Since it is less than 0.05, we can reject null hypothesis.
Lets do an modified significance value
Modified significance level ( Alpha) = significance_level/K K= k(k-1)/2
Here also, we can reject null hypothesis. ( Modified alpha being .005 which is greater than then pvalue)
MSG <- SSG/Df_G
MSG
## [1] 2627
MSE <- SSE/Df_R
MSE
## [1] 503.8892
fvalue <- MSG/MSE
fvalue
## [1] 5.213448
pf(fvalue, Df_G, Df_R, lower.tail = F)
## [1] 0.0003391195
K <- 5 * 4/2
alpha_modified <- 0.05/K
alpha_modified
## [1] 0.005