Justin Herman Foundations for statistical inference

5.6, 5.14, 5.20, 5.32, 5.48

5.6 Working backwards, Part II.

sample mean= lower tail + upper tail /2

sample_mean <- (65+77)/2

Answer:: Our sample mean is 71

Calculate Margin of Error

my_moe <- 77-71

Answer:: My margin of error is 6

Get tscore value from df

our_t <- qt(.05, df=24)

Answer:: Our t-value is -1.7108821

Plug in to equations, solve for Sample Standard deviaton

SE= (Tval) (s/ sqrt(n)) (SE*sqrt(n))/ (Tval) =s

sample_error=round((6/1.71)*5,2)

Answer:: our sample Standard deviaton is 17.54

5.14 SAT scores.

SAT scores of students at an Ivy League college

(a) Raina wants to use a 90% confidence interval. How large a sample should she collect?

Conditions to use normal distribution seem to be met. Sample size is well over 30, independent and we know the populations std deviation

We are looking for x in ME=hval* std_dev/sqrt(X)

Desired_N

x <- 250^2/(25^2/1.645^2)

Answer:: our desired sample size 270.6025

(b) Luke wants to use a 99% confidence interval.

Given a 99% CI the desired Z value score(or Tscore if being used) would be higher. A higher Z value score will decrease the denominator in the formula above, which would make our total N value increase

(c) Calculate the minimum required sample size for Luke.

#t_val_99 <- qt(.005, df=249)
our_desired_N_99 <- 250^2/(25^2/2.576^2)

Answer:: our desired sample size 663.5776

5.20

(a) Is there a clear difference in the average reading and writing scores?

The median in the writing score seems a little higher, the mean scores seem very similar. The difference distribution seems like it is close to normally distrbuted, centered at 0 which would indicate the scores are in fact very similar.

(b) Are the reading and writing scores of each student independent of each other?

The wording seems strange here. If the question is are the students scores independent of each other, presuming they didn’t cheat, then yes. However, if the question is reffering to the paired data, it would appear the scores aren’t independent of each other

(c) Create a hypothesis test

H~O: There is no difference between the average reading and writing score \[U_{diff}=0\] H~A: There is a difference between the average reading and writing score \[U_{diff} \neq0\]``

(d) Check the conditions required to complete this test.

Sample makes up less than 10% of population- condition met
sample size large enough(over n=30)- condition met

(e)these data provide convincing evidence of a difference between the average scores on the two exams?

\[ tscore= (U_{diff}-0)/(S_{diff}/\sqrt{N}) \]

t=round(.545/(8.887/sqrt(200)),2)
df=199
our_t_2 <- qt(.05, df=199)

our_p_val <- pt(t,199,lower.tail = FALSE)

Using R our P value is around two times 0.1926743= .38
looking at the T value chart a two tail T value of ** 663.5776** with 199 DF, is more than .20 which means we fail to reject our hypothesis, there isnt evidence of a difference

(f) What type of error might we have made? Explain what the error means in the context of the application.

We could have made a type 2 error, failing to reject the null when we should have. Considering the data, a type 2 error here wouldn’t be all that dangerous.

(g) Based on the results of this hypothesis test, would you expect a confidence interval to include 0? Explain your reasoning.

Yes. We failed to reject the null hpyothesis which means that we couldn’t declare there was an actual difference, therefore a differnece of 0 should be contained within our confidence interval.

5.32

\[ df=25\] \[ U_{diff}= 19.85-16.12=3.73 \]

my_u_diff <- 19.85-16.12
my_se_now <- sqrt(3.58^2/26+4.51^2/26)

\[ SE= \sqrt{ \frac{3.58^2}{26}+\frac{4.51^2}{26}} \]

\[ tscore= \frac{U_{diff}}{SE}\]

the_t <- round(my_u_diff/my_se_now,2)

Our t score is 3.3

our_p_val_now <- 2*(pt(the_t,25,lower.tail = FALSE))

Answer:: Our p value is extremely small .002 and therefore we can reject the Null

5.48

A.

HW=hours worked

\[ H_{o}= μ_{lessHsHW} =μ_{HsHW}=μ_{JrCollHW}=μ_{BachelorHW}=μ_{GraduateHW} \] HA: The average hours worked varies across some (or all) groups.

B.

the observations are independent within and across groups-Probably TRUE
the data within each group are nearly normal-TRUE
the variability across the groups is about equal.-TRUE

C

..	Df	Sum sq	Mean sq	F value	pr>F
degree	4	2004.08	501.54	2.1868	.0682
residuals	1167	267382	229.1120
total	1171	269388.16

D. Conclusion

lets assume we are looking for a 95% level
our p vlaue is .0682 so we fail to reject the null hypothesis

Justin Herman Foundations for statistical inference - Sampling distributions