A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.
n <- 25
CI <- 0.90
lower_bound <- 65
upper_bound <- 77
#find p-value/2
p_2tails <- CI + (1 - CI)/2
#find degree of freedom
df <- n-1
#find margin of error
ME <- (upper_bound-lower_bound)/2
ME
## [1] 6
#find T-value
T <- qt(p_2tails, df)
#find mean
mean <- 65 + ME
#find standard deviation
sd <- sqrt(n)*ME/T
mean
## [1] 71
sd
## [1] 17.53481
SAT Scores SAT Scores of students at an Ivy League college are distributed with a standard deviation of 250 points. Two statistics students, Raina and Luke, want to estimate the average SAT score of students in this college as part of a class project. They want their margin of error to be no more than 25 points.
Raina wants to use a 90% confidence interval. How large a sample should she collect?
Z * SE <= ME
Z * SE <= ME
SE = SD/sqrt(n)
Z * SD/sqrt(n) <= ME
Z * SD <= ME sqrt(n)
n >= sqr (Z * SD/ME)
Z <- 1.65
ME <- 25
sd <- 250
n <- (Z*sd/ME)^2
round(n,0)
## [1] 272
The size of the sample should be no less than 272 students.
We figured out that n = sqr(Z*sd/ME)
Since Z-score of 99% confidence interval is larger than 90% confidence interval Luke;s sample should be larger than Raina’s.
Z <- 2.576
n <- (Z*sd/ME)^2
round(n,0)
## [1] 664
The size of the sample should be no less than 664 students.
There no clear difference in the average of the reading and writing scores because the the difference in scores distribution is fairly normal and grouped around the zero.
I would say that the scores of each student are independent of each other. However, reading and writing scores of the a student are not independent of each other.
Null hypotheses: There is NO difference in the average scores of students in the reading and writing exam. Alternative hypotheses: There is difference in the average scores of students in the reading and writing exam.
First, we assume that data is from random sample. Second, we have to check the independence of observations. By looking at the difference histogram we can assume that the data are paired. Pared data can’t be independent. However, we cam assume that paired data represents less that 10% population. Third, we have to check whether the data normally distributed or not. The box plot suggests that the data are reasonably normally distributed. Moreover, no outliers exist.
Even tough the second condition doesn’t meet requirements we still can perform t-test. However, the results might need additional analysis.
n <- 200
sd_diff <- 8.887
mean_diff <- -0.545
#find degree of freedom
df <- n-1
#find standard error
SE <- sd_diff/sqrt(n)
#find T-value
T_val <- mean_diff/SE
#find p-value
p <- pt(T_val, df)
p
## [1] 0.1934182
Since the p-value is greater than 0.05 we can’t reject null hypotheses. So, there is no convincing evidence of a difference between the average reading and writing exam scores.
Type I error: Reject the null hypotheses when it’s actually true. Type II error: Do not reject the null hypotheses when it’s false.
We might experience Type II error since we couldn’t reject null hypotheses.
When the confidence interval include 0 it indicates that there is no convincing evidence about difference in means. Since we failed to rejected null hypotheses we can say that a confidence interval for the average difference between the reading and writing scores includes 0.
Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2012. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? Assume that conditions for inference are satisfied.
Let’s state hypotheses.
Null Hypotheses: There is a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage. Alternative Hypotheses: There is NO difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage.
n <- 26+26-2
# Automatic transmission
mean_aut <- 16.12
sd_aut <- 3.58
# Manual transmission
mean_man <- 19.85
sd_man <- 4.51
# difference in sample means
mean_diff <- mean_aut - mean_man
# standard error of this point estimate
SE <- ( (sd_aut ^ 2 / n) + ( sd_man ^ 2 / n) ) ^ 0.5
#finf t-value
T <- (mean_diff - 0) / SE
#find degree of freedom
df <- n - 1
#find p-value in percents
p <- pt(T, df = df)
p*100
## [1] 0.001603064
Since p-value is less than 5% we have to reject a null hypotheses. There is a strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage.
The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents.47 Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.
Null Hypotheses: There is NO difference between averages of five groups. Alternative Hypotheses: At least one average doesn’t equal to others.
The observations are independent within and between groups. I would assume that observations are independent based on the nature of the data.
The data within groups is normal. Each group has outliers. Some groups follow normal distribution.
The cross group variance is relatively equal. By observing the standard deviations I would say that the cross group variance is relatively equal.
group <- c("LessthanHS","HS","JrColl","Bachelors","Graduates")
mean <- c(38.67, 39.6, 41.39, 42.55, 40.85)
sd <- c(15.81, 14.97, 18.1, 13.62, 15.51)
n <- c(121, 546, 97, 253, 155)
data <- data.frame (group,mean, sd, n)
head(data)
## group mean sd n
## 1 LessthanHS 38.67 15.81 121
## 2 HS 39.60 14.97 546
## 3 JrColl 41.39 18.10 97
## 4 Bachelors 42.55 13.62 253
## 5 Graduates 40.85 15.51 155
n <- sum(data$n)
k <- length(data$mean)
#find degrees of freedom
degree_df <- k - 1
degree_residuals <- n - k
#find F-statistic
Pr_F <- 0.0682 #from table
F_value <- qf( 1 - Pr_F, df , degree_residuals)
#find Mean Sq
Mean_Sq_degree <- 501.54
Mean_Sq_residuals <- Mean_Sq_degree / F_value
#find Sum Sq
Sum_Sq_degree <- df * Mean_Sq_degree
Sum_Sq_residuals <- 267382
#find total of degrees
df_total <- degree_df + degree_residuals
#find total of sum sq
Sum_sq_total <- Sum_Sq_degree + Sum_Sq_residuals
#group <- c("LessthanHS","HS","JrColl","Bachelors","Graduates")
data_type <- c("degree","Residuals","Total")
df_data <- c(degree_df,Sum_Sq_degree,df_total)
sum_sq_data <- c(Sum_Sq_degree,Sum_Sq_residuals,Sum_sq_total)
mean_sq_data <- c(Mean_Sq_degree,Mean_Sq_residuals,"")
F_value <- c(F_value,"","")
Pr_F <- c(Pr_F,"","")
table <- data.frame (data_type,df_data,sum_sq_data,mean_sq_data,F_value,Pr_F)
head(table)
## data_type df_data sum_sq_data mean_sq_data F_value Pr_F
## 1 degree 4.00 24575.46 501.54 1.32561453314125 0.0682
## 2 Residuals 24575.46 267382.00 378.345278707471
## 3 Total 1171.00 291957.46
Since p-value (0.0682) is greater than 0.05 we fail to reject the null hypotheses. So, there is no difference between averages of five groups.