HW 5

5.6

A 90% confidence interval for a population mean is (65,77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations.Calculate the sample mean, the margin of error, and the sample standard deviation.

n<- 25
samplemean<- ((77+65)/2)
marginoferror<- ((77-65)/2)
df<-n-1 ## Often described as n-1
tvalue<- qt(.95,df)
samplesd<- (marginoferror/tvalue)*sqrt(n)
samplemean

## [1] 71

marginoferror

## [1] 6

samplesd

## [1] 17.53481

Where 71 is the sample mean, 6 is the margin of error and 17.53 is the standard deviation

5.12

a) Write down the hypotheses that would be appropriate for testing if the police officers appear to have been exposed to a higher concentration of lead. Ho: Police officers have not been been exposed to a higher concentration of lead than the general public. Ha: Police officers have been been exposed to a higher concentration of lead than the general public.

b) Explicitly state and check all conditions necessary for inference on these data. While it is unclear if this is a random sample or not we cannot establish independence. However due to the sample size it is possible that the findinds will hold.

c) Test the hypothesis that the downtown police officers have a higher lead exposure than the group in the previous study. Interpret your results in context.

samplen<-52
meanleadwexposure<-124.32
meanleadnoexposure<-35
standarddeviation<-37.74
t.value<-((meanleadwexposure-meanleadnoexposure)/(standarddeviation/sqrt(samplen)))
pvalue<- 2*pt(-abs(t.value),df=51)
pvalue

## [1] 9.913949e-23

Because the p-value is so small one can reject the null hypothesis meaning that police officers have indeed been exposed to a higher concentration of lead d) Based on your preceding result, without performing a calculation, would a 99% confidence interval for the average blood concentration level of police officers contain 35 μg/l? It would contain higher than that amount due to the small nature of the code.

5.18

a) We would like to know if Intel’s stock and Southwest Airlines’ stock have similar rates of return. To find out, we take a random sample of 50 days, and record Intel’s and Southwest’s stock on those same days. I would assume it is not paired because there is not a special connection between the two datasets.

b) We randomly sample 50 items from Target stores and note the price for each. Then we visit Walmart and collect the price for each of those same 50 items. Yes because the data comes from the same item just differentiating the store that sells it.

c) A school board would like to determine whether there is a difference in average SAT scores for students at one high school versus another high school in the district. To check, they take a simple random sample of 100 students from each high school. No because it is testing the means from two different samples that might have been taken during broadly different circumstances.

5.24

Not true because this is not always true as there might not be a direct relationship or actual point of comparison between the two datasets.

5.30

5.42

We should use a chi squared test in order to look at the statistics of multiple variables and find out any significant relationships and/or differences between them.

5.48

a) Write hypotheses for evaluating whether the average number of hours worked varies across the five groups. Ho: There is no difference in average hours worked across the five groups Ha: There is a difference in average hours worked across the five groups

b) Check conditions and describe any assumptions you must make to proceed with the test. It would seem the samples are independent as they are randomly selected. The graph demonstrates the data is nearly normal except for the bachelors group Variability appears to be equal due to the similar sd

c) Below is part of the output associated with this test. Fill in the empty cells. Df SumSq MeanSq F value Pr(>F) degree 4 2006.16 501.54 2.18 0.0682 Residuals 1167 267,382 229.12 Total 1171 269388.16 d) What is the conclusion of the test? Because the p-value is greater than 0.05 we cannot reject the null hypothesis meaning that there is no (significant) difference in average hours worked accross the five groups.