Week 5 HW

5.6 Working Backwards Pt. II

A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

Sample Mean

xbar <- ((77+65)/2)
xbar
## [1] 71

The sample mean is equal to ((x1+x2)/2) when the Confidence Interval is (x1,x2), so the sample mean is 71.

Margin of Error

ME <- ((77-65)/2)
ME
## [1] 6

The margin of error is equal to ((x2-x1)/2) when the Confidence Interval is (x1,x2) so the margin of error is 6.

Sample Standard Deviation

df <- 25 - 1
p <- 0.9
p_2tails <- p + (1 - p)/2
p_2tails
## [1] 0.95
t.value <- qt(p_2tails, df)
t.value
## [1] 1.710882
sd <- (ME/t.value)*5
sd
## [1] 17.53481

To get the standard deviation of the sample we first find the two tail probability. Then we use the qt( ) function along with the probability and degrees of freedom (24) to get our T-value. Then we can divide margin of error by that t-value * 5 to get our standard deviation of 17.53.

5.12 Auto Exhaust and Lead Exposure

Researchers interested in lead exposure due to car exhaust sampled the blood of 52 police officers subjected to constant inhalation of automobile exhaust fumes while working traffic enforcement in a primarily urban environment. The blood samples of these officers had an average lead concentration of 124.32 μg/l and a SD of 37.74 μg/l; a previous study of individuals from a nearby suburb, with no history of exposure, found an average blood level concentration of 35 μg/l.
a)Write down the hypotheses that would be appropriate for testing if the police officers appear to have been exposed to a higher concentration of lead.

Ho <= 35 ug/l

Ha > 35 ug/l

b)Explicitly state and check all conditions necessary for inference on these data.

Random: It does not say whether or not the officers were randomly sampled

Normal: The sample distribution appears to be normal but still need more data to find out.

Independent: Sample size seems big enough at 52

c)Test the hypothesis that the downtown police officers have a higher lead exposure than the group in the previous study. Interpret your results in context.

Ho <= 35 ug/l

Ha > 35 ug/l

n <- 52
x1 <- 35
x2 <- 124.32
SD2 <- 37.74

t_score <- ((x2-x1)/(37.4/sqrt(52)))
t_score
## [1] 17.22181

Here we have a very large t-score and a very small p-value. This gives us enough evidence to reject the null hypothesis that the lead concentration of the officers constantly exposed to exhaust is less than or equal to the mean in the previous sample of 35.

d)Based on your preceding result, without performing a calculation, would a 99% confidence interval for the average blood concentration level of police officers contain 35 μg/l?

No. The p-value is too small still to be greater than .005. The average concentration level of 35 would still not be within the interval.

5.18 Paired or Not

In each of the following scenarios, determine if the data are paired.
a)We would like to know if Intel’s stock and Southwest Airlines’ stock have similar rates of return. To find out, we take a random sample of 50 days, and record Intel’s and Southwest’s stock on those same days.

Paired

b)We randomly sample 50 items from Target stores and note the price for each. Then we visit Walmart and collect the price for each of those same 50 items.

Paired

c)A school board would like to determine whether there is a difference in average SAT scores for students at one high school versus another high school in the district. To check, they take a simple random sample of 100 students from each high school.

Not paired

5.24 Sample Size and Pairing

Determine if the following statement is true or false, and if false, explain your reasoning: If comparing means of two groups with equal sample sizes, always use a paired test.

False. You do not ALWAYS use a paired test in these situations. There must be a natural correspondence between the observations in one set to an observation in the other set, in order to use a paired test.

5.30 Diamonds

In Exercise 5.28, we discussed diamond prices (standardized by weight) for diamonds with weights 0.99 carats and 1 carat. See the table for summary statistics, and then construct a 95% confidence interval for the average di↵erence between the standardized prices of 0.99 and 1 carat diamonds. You may assume the conditions for inference are met.
df <- 22
p <- 0.95
p_2tails <- p + (1 - p)/2
p_2tails
## [1] 0.975
t <- qt(p_2tails, df)
t
## [1] 2.073873

When using the qt( ) function we get a t-score of 2.07. And from Exercise 3.28 the SE = 4.3619.

Therefore the confidence interval is:

2.3 +/- 2.07 x 4.3619 = 12.3 +/- 9.03 =

(3.27, 21.33).

5.36 The researchers from Exercise 5.35 also investigated the effects of being distracted by a game on how much people eat. The 22 patients in the treatment group who ate their lunch while playing solitaire were asked to do a serial-order recall of the food lunch items they ate. The average number of items recalled by the patients in this group was 4.9, with a standard deviation of 1.8. The average number of items recalled by the patients in the control group (no distraction) was 6.1, with a standard deviation of 1.8. Do these data provide strong evidence that the average number of food items recalled by the patients in the treatment and control groups are different?

Ho: = 6.1

Ha <> 6.1

n <- 22
x1 <- 6.1
x2 <- 4.9
SD2 <- 1.8

t_score <- t_score <- ((x2-x1)/(1.8/sqrt(22)))
t_score
## [1] -3.126944

The t-score is just large enough >3 to reject the null hypothesis that the results of the treatment group were different than the control group.

5.42 Which Test?

We would like to test if students who are in the social sciences, natural sciences, arts and humanities, and other fields spend the same amount of time studying for this course. What type of test should we use? Explain your reasoning.

Since there are multiple discrete variables in this example, the ANOVA or chi square test would be best to test the differences of means and variance.

5.48 Work Hours and Education

The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents. Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis.
a)Write hypotheses for evaluating whether the average number of hours worked varies across the five groups.

Ho: b1=b2=b3=b4=b5

Ha: at least one variables mean is not equal

b) Check conditions and describe any assumptions you must make to proceed with the test

Independent: There seems to be independence across the groups

Normal: It seems like the distributions are close enough to normal

Variability: There seems to be equal variability across the groups

c) Below is part of the output associated with this test. Fill in the empty cells.

Dfd = 4 DfR = 1167 DfT = 1171

SumSqd = 2004.11 SumSqT = 269,387.11

MeanSqR = 229.11

F-Value = 2.19

d)What is the conclusion of the test?

The p-value of .0682 is larger than the critical value of .05. Therefore, we do not reject the null, i.e there is not enough evidence to say there is a significant difference between the groups.