Task 2 ~ Confidence Intervals
Lab3 ~ Confidence Intervals
| Kontak | \(\downarrow\) |
| naftaligunawan@gmail.com | |
| https://www.instagram.com/nbrigittag/ | |
| RPubs | https://rpubs.com/naftalibrigitta/ |
| Nama | Naftali Brigitta Gunawan |
| NIM | 20214920002 |
Exercise 1
Find a point estimate of average university student Age with the sample data from survey!
Answer:
library(MASS) # load the MASS package data set survey
age.survey = survey$Age # save the survey data of student ages
mean(age.survey, na.rm=TRUE) # the point estimate of student ages## [1] 20.37451
p.est<-t.test(age.survey, conf.level = 0.95) # computes a number of statistical tests
p.est$conf.int # print confidence intervals## [1] 19.54600 21.20303
## attr(,"conf.level")
## [1] 0.95
As we can see, confidence intervals for the average university student age with the sample data from survey is 19.54600 (20) and 21.20303 (21) years old. Therefore, we can say with 95% confidence that this interval estimate includes the true population-mean is equal to 20.37451 (20) years old.
Exercise 2
Assume the population standard deviation \(\sigma\) of the student Age in data survey is 7. Find the margin of error and interval estimate at 95% confidence level.
Answer:
library(MASS) # load the MASS package data set survey
age.response = na.omit(survey$Age) # filter out missing values in Age
n = length(age.response) # assign the length of response
s = 7 # sample standard deviation
SE = s/sqrt(n) # standard error estimate
E = qt(.975, df=n-1)*SE; E # margin of error (upper tail 95% of CI)## [1] 0.8957872
xbar = mean(age.response); xbar # sample mean## [1] 20.37451
xbar + c(-E, E) # confidence interval as told## [1] 19.47873 21.27030
The population standard deviation is 7, the margin of error for the student height survey at 95% confidence level is 0.8957872 centimeters. The confidence interval is between 19.47873 (19) and 21.27030 (21) years old
Alternative Solution: Instead of using the textbook formula, we can apply the t.test function in the built-in stats package.
library(stats) # load stats package
t.test(age.response) # apply the z.test##
## One Sample t-test
##
## data: age.response
## t = 48.447, df = 236, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 19.54600 21.20303
## sample estimates:
## mean of x
## 20.37451
Exercise 3
Without assuming the population standard deviation \(\sigma\) of the student Age in survey, find the margin of error and interval estimate at 95% confidence level.
Answer:
I will put the number 9.48 for the sample standard deviation
library(MASS) # load the MASS package data set survey
age.response = na.omit(survey$Age) # filter out missing values in Age
n = length(age.response) # assign the length of response
s = 9.48 # sample standard deviation
SE = s/sqrt(n) # standard error estimate
E = qt(.975, df=n-1)*SE; E # margin of error (upper tail 95% of CI)## [1] 1.213152
xbar = mean(age.response); xbar # sample mean## [1] 20.37451
xbar + c(-E, E) # confidence interval as told## [1] 19.16136 21.58767
Without assumption on the population standard deviation, the margin of error for the student height survey at 95% confidence level is 1.213152 centimeters. The confidence interval is between 19.16136 (19) and 21.58767 (22) years old
Alternative Solution: Instead of using the textbook formula, we can apply the t.test function in the built-in stats package.
library(stats) # load stats package
t.test(age.response) # apply the z.test##
## One Sample t-test
##
## data: age.response
## t = 48.447, df = 236, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 19.54600 21.20303
## sample estimates:
## mean of x
## 20.37451
Exercise 4
Improve the quality of a sample survey by increasing the sample size with unknown standard deviation \(\sigma\)!.
Answer:
We don’t know the standard deviation, so we can assume the half of students wrote the survey, so half of students can we write 0.5. Then, we use 5% for the margin of error, so margin of error (E) we can write 0.05.
zstar = qnorm(.975) # quantiles (95% confidence level)
p = 0.5 # no estimate of the proportion given, we use 50% for a conservative estimate.
E = 0.05 # expected error
zstar^2*p*(1-p)/E^2 ## [1] 384.1459
So, the sample size is needed 384.1459 (384) sample.
Exercise 5
Assume you don’t have planned proportion estimate, find the sample size needed to achieve 5% margin of error for the male student survey at 95% confidence level!
Answer:
zstar = qnorm(.975) # quantiles (95% confidence level)
p = 0.5 # no estimate of the proportion given, we use 50% for a conservative estimate.
E = 0.05 # expected error
zstar^2*p*(1-p)/E^2 ## [1] 384.1459
So, the sample size is needed 384,14 (384) sample to achieve 5% margin of error for the male student t 95% Confident Interval.
Exercise 6
Perform confidence intervals analysis on this data set from 2004 that includes data on average hourly earnings, marital status, gender, and age for thousands of people.
Answer:
Average Hourly Earnings (AHE)
cuy <- read.csv('cps04.csv', header = T, sep = ",") # to read csv which already downloaded
cuy # open csvavghour.response = na.omit(cuy$ahe) # filter out missing values in Average Hourly Earnings (AHE)
n = length(avghour.response) # assign the length of response
s = sd(avghour.response) # standard deviation
SE = s/sqrt(n) # standard error estimate
E = qt(.975, df=n-1)*SE; E # margin of error (upper tail 95% of CI)## [1] 0.1921255
xbar = mean(avghour.response); xbar # sample mean## [1] 16.7712
xbar + c(-E, E) # confidence interval as told## [1] 16.57908 16.96333
So, the result are:
Margin of Error of AHE is
0.1921255.xbar or sample mean is
16.7712.Confidence Interval are between
16.57908 and 16.96333.
Bachelor
bachelor.response = na.omit(cuy$bachelor) # filter out missing values in Bachelor
n = length(bachelor.response) # assign the length of response
k = sum(bachelor.response == "1"); k # sum of people who have bachelor## [1] 3640
s = sd(bachelor.response) # standard deviation
SE = s/sqrt(n) # standard error estimate
E = qt(.975, df=n-1)*SE; E # margin of error (upper tail 95% of CI)## [1] 0.01092554
xbar = mean(bachelor.response); xbar # sample mean## [1] 0.4557976
xbar + c(-E, E) # confidence interval as told## [1] 0.4448721 0.4667232
So, the result are:
People who have bachelor are
3640people.Margin of Error of Bachelor is
0.01092554.xbar or sample mean is
0.4557976.Confidence Interval are between
0.4448721 and 0.4667232.
Female
cwk.response = na.omit(cuy$female) # filter out missing values in Female
n = length(cwk.response) # assign the length of response
k = sum(cwk.response == "1"); k # sum of female## [1] 3313
s = sd(cwk.response) # standard deviation
SE = s/sqrt(n) # standard error estimate
E = qt(.975, df=n-1)*SE; E # margin of error (upper tail 95% of CI)## [1] 0.01080826
xbar = mean(cwk.response); xbar # sample mean## [1] 0.414851
xbar + c(-E, E) # confidence interval as told## [1] 0.4040427 0.4256592
So, the result are:
SUM of Female are
3313people.Margin of Error of Bachelor is
0.01080826.xbar or sample mean is
0.414851.Confidence Interval are between
0.4040427 and 0.4256592.
Age
age.respons = na.omit(cuy$age) # filter out missing values in Age
n = length(age.respons) # assign the length of response
s = sd(age.respons) # standard deviation
SE = s/sqrt(n) # standard error estimate
E = qt(.975, df=n-1)*SE; E # margin of error (upper tail 95% of CI)## [1] 0.06341853
xbar = mean(age.respons); xbar # sample mean## [1] 29.75445
xbar + c(-E, E) # confidence interval as told## [1] 29.69103 29.81786
So, the result are:
Margin of Error of Age is
0.06341853.xbar or sample mean is
29.75445.Confidence Interval are between
29.69103 and 29.81786.