Task 2 ~ Confidence Intervals

Lab3 ~ Confidence Intervals

Kontak	\(\downarrow\)
Email	naftaligunawan@gmail.com
Instagram	https://www.instagram.com/nbrigittag/
RPubs	https://rpubs.com/naftalibrigitta/
Nama	Naftali Brigitta Gunawan
NIM	20214920002

Exercise 1

Find a point estimate of average university student Age with the sample data from survey!

Answer:

library(MASS)                                    # load the MASS package data set survey
age.survey = survey$Age                          # save the survey data of student ages
mean(age.survey, na.rm=TRUE)                     # the point estimate of student ages

## [1] 20.37451

p.est<-t.test(age.survey, conf.level = 0.95)     # computes a number of statistical tests 
p.est$conf.int                                   # print confidence intervals

## [1] 19.54600 21.20303
## attr(,"conf.level")
## [1] 0.95

As we can see, confidence intervals for the average university student age with the sample data from survey is 19.54600 (20) and 21.20303 (21) years old. Therefore, we can say with 95% confidence that this interval estimate includes the true population-mean is equal to 20.37451 (20) years old.

Exercise 2

Assume the population standard deviation \(\sigma\) of the student Age in data survey is 7. Find the margin of error and interval estimate at 95% confidence level.

Answer:

library(MASS)                                     # load the MASS package data set survey
age.response = na.omit(survey$Age)                # filter out missing values in Age
n = length(age.response)                          # assign the length of response
s = 7                                             # sample standard deviation 
SE = s/sqrt(n)                                    # standard error estimate
E = qt(.975, df=n-1)*SE; E                        # margin of error (upper tail 95% of CI)

## [1] 0.8957872

xbar = mean(age.response); xbar                   # sample mean

## [1] 20.37451

xbar + c(-E, E)                                   # confidence interval as told

## [1] 19.47873 21.27030

The population standard deviation is 7, the margin of error for the student height survey at 95% confidence level is 0.8957872 centimeters. The confidence interval is between 19.47873 (19) and 21.27030 (21) years old

Alternative Solution: Instead of using the textbook formula, we can apply the t.test function in the built-in stats package.

library(stats)                                     # load stats package 
t.test(age.response)                               # apply the z.test

## 
##  One Sample t-test
## 
## data:  age.response
## t = 48.447, df = 236, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  19.54600 21.20303
## sample estimates:
## mean of x 
##  20.37451

Exercise 3

Without assuming the population standard deviation \(\sigma\) of the student Age in survey, find the margin of error and interval estimate at 95% confidence level.

Answer:

I will put the number 9.48 for the sample standard deviation

library(MASS)                                     # load the MASS package data set survey
age.response = na.omit(survey$Age)                # filter out missing values in Age
n = length(age.response)                          # assign the length of response
s = 9.48                                          # sample standard deviation 
SE = s/sqrt(n)                                    # standard error estimate
E = qt(.975, df=n-1)*SE; E                        # margin of error (upper tail 95% of CI)

## [1] 1.213152

xbar = mean(age.response); xbar                   # sample mean

## [1] 20.37451

xbar + c(-E, E)                                   # confidence interval as told

## [1] 19.16136 21.58767

Without assumption on the population standard deviation, the margin of error for the student height survey at 95% confidence level is 1.213152 centimeters. The confidence interval is between 19.16136 (19) and 21.58767 (22) years old

Alternative Solution: Instead of using the textbook formula, we can apply the t.test function in the built-in stats package.

library(stats)                                     # load stats package 
t.test(age.response)                               # apply the z.test

## 
##  One Sample t-test
## 
## data:  age.response
## t = 48.447, df = 236, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  19.54600 21.20303
## sample estimates:
## mean of x 
##  20.37451

Exercise 4

Improve the quality of a sample survey by increasing the sample size with unknown standard deviation \(\sigma\)!.

Answer:

We don’t know the standard deviation, so we can assume the half of students wrote the survey, so half of students can we write 0.5. Then, we use 5% for the margin of error, so margin of error (E) we can write 0.05.

zstar = qnorm(.975)                                # quantiles (95% confidence level)
p = 0.5                                            # no estimate of the proportion given, we use 50% for a conservative estimate.
E = 0.05                                           # expected error
zstar^2*p*(1-p)/E^2

## [1] 384.1459

So, the sample size is needed 384.1459 (384) sample.

Exercise 5

Assume you don’t have planned proportion estimate, find the sample size needed to achieve 5% margin of error for the male student survey at 95% confidence level!

Answer:

zstar = qnorm(.975)                                # quantiles (95% confidence level)
p = 0.5                                            # no estimate of the proportion given, we use 50% for a conservative estimate.
E = 0.05                                           # expected error
zstar^2*p*(1-p)/E^2

## [1] 384.1459

So, the sample size is needed 384,14 (384) sample to achieve 5% margin of error for the male student t 95% Confident Interval.

Exercise 6

Perform confidence intervals analysis on this data set from 2004 that includes data on average hourly earnings, marital status, gender, and age for thousands of people.

Answer:

Average Hourly Earnings (AHE)

cuy <- read.csv('cps04.csv', header = T, sep = ",") # to read csv which already downloaded
cuy                                                 # open csv

avghour.response = na.omit(cuy$ahe)               # filter out missing values in Average Hourly Earnings (AHE)
n = length(avghour.response)                      # assign the length of response
s = sd(avghour.response)                          # standard deviation 
SE = s/sqrt(n)                                    # standard error estimate
E = qt(.975, df=n-1)*SE; E                        # margin of error (upper tail 95% of CI)

## [1] 0.1921255

xbar = mean(avghour.response); xbar               # sample mean

## [1] 16.7712

xbar + c(-E, E)                                   # confidence interval as told

## [1] 16.57908 16.96333

So, the result are:

Margin of Error of AHE is 0.1921255.
xbar or sample mean is 16.7712.
Confidence Interval are between 16.57908 and 16.96333.

Bachelor

bachelor.response = na.omit(cuy$bachelor)          # filter out missing values in Bachelor
n = length(bachelor.response)                      # assign the length of response
k = sum(bachelor.response == "1"); k               # sum of people who have bachelor

## [1] 3640

s = sd(bachelor.response)                          # standard deviation 
SE = s/sqrt(n)                                     # standard error estimate
E = qt(.975, df=n-1)*SE; E                         # margin of error (upper tail 95% of CI)

## [1] 0.01092554

xbar = mean(bachelor.response); xbar               # sample mean

## [1] 0.4557976

xbar + c(-E, E)                                    # confidence interval as told

## [1] 0.4448721 0.4667232

So, the result are:

People who have bachelor are 3640 people.
Margin of Error of Bachelor is 0.01092554.
xbar or sample mean is 0.4557976.
Confidence Interval are between 0.4448721 and 0.4667232.

Female

cwk.response = na.omit(cuy$female)                 # filter out missing values in Female
n = length(cwk.response)                           # assign the length of response
k = sum(cwk.response == "1"); k                    # sum of female

## [1] 3313

s = sd(cwk.response)                               # standard deviation 
SE = s/sqrt(n)                                     # standard error estimate
E = qt(.975, df=n-1)*SE; E                         # margin of error (upper tail 95% of CI)

## [1] 0.01080826

xbar = mean(cwk.response); xbar                    # sample mean

## [1] 0.414851

xbar + c(-E, E)                                    # confidence interval as told

## [1] 0.4040427 0.4256592

So, the result are:

SUM of Female are 3313 people.
Margin of Error of Bachelor is 0.01080826.
xbar or sample mean is 0.414851.
Confidence Interval are between 0.4040427 and 0.4256592.

Age

age.respons = na.omit(cuy$age)                     # filter out missing values in Age
n = length(age.respons)                            # assign the length of response
s = sd(age.respons)                                # standard deviation 
SE = s/sqrt(n)                                     # standard error estimate
E = qt(.975, df=n-1)*SE; E                         # margin of error (upper tail 95% of CI)

## [1] 0.06341853

xbar = mean(age.respons); xbar                     # sample mean

## [1] 29.75445

xbar + c(-E, E)                                    # confidence interval as told

## [1] 29.69103 29.81786

So, the result are:

Margin of Error of Age is 0.06341853.
xbar or sample mean is 29.75445.
Confidence Interval are between 29.69103 and 29.81786.