5.6 Working backwards, Part II.

A 90% confidence interval for a population mean is (65,77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sample standard deviation.

Answer:

n=25 df=n-1 xbar+alphasd=77 xbar-alphasd=65

Thus, solving for above two equations:-
xbar=(65+77)/2=71 For 90% confidence,
zatalpha=1.645
sd=12/z_at_alpha=12/1.645=7.291
moe=z_at_alpha* sd/sqrt(n) = 1.645 * 7.291/sqrt(25)=12/25=0.48


5.12 Auto exhaust and lead exposure.

Researchers interested in lead exposure due to car exhaust sampled the blood of 52 police officers subjected to constant inhalation of automobile exhaust fumes while working traffic enforcement in a primarily urban environment. The blood samples of these officers had an average lead concentration of 124.32 μg/l and a SD of 37.74 μg/l; a previous study of individuals from a nearby suburb, with no history of exposure, found an average blood level concentration of 35 μg/l.

Given:

n=52
Pop_Mean(lead cocentration)=35 μg/l
Sample_Mean(lead cocentration)=124.32 μg/l
Sample_SD(lead cocentration)=37.74 μg/l

  1. Write down the hypotheses that would be appropriate for testing if the police officers appear to have been exposed to a higher concentration of lead.
Answer:

Ho: Police officers have not been exposed to a higher concentration of lead. In other words, the avg_lead_conc of the sample is same or less than pop_lead_conc.

Ha: Police officers appear to have been exposed to a higher concentration of lead. In other words, the avg_lead_conc of the sample is higher than pop_lead_conc.

  1. Explicitly state and check all conditions necessary for inference on these data.
Answer:

-> Independence: sample size is 52 however we dont know the population size. If the population is more than 520 then this condition is satisfied.
-> Normality: such that most of the observations lie within(around) the 2 SD of the mean.

  1. Test the hypothesis that the downtown police officers have a higher lead exposure than the group in the previous study. Interpret your results in context.
Answer:

Ho: Downtown Police officers do not have higher exposure to lead. In other words, the avg_lead_conc of the sample is not higher than pop_lead_conc.

Ha: Downtown Police officers have higher exposure to lead. In other words, the avg_lead_conc of the sample is higher than pop_lead_conc.

  1. Based on your preceding result, without performing a calculation, would a 99% confidence interval for the average 0blood concentration level of police officers contain 35 μg/l?
Answer:

z_at_alpha here = 2.58 for 99% confidence. Confidence Interval: 124.32-(2.58 * 37.74) = 26.95 124.32+(2.58 * 37.74) = 221.69


5.18 Paired or not, Part II?

In each of the following scenarios, determine if the data are paired. (a) We would like to know if Intel’s stock and Southwest Airlines’ stock have similar rates of return. To find out, we take a random sample of 50 days, and record Intel’s and Southwest’s stock on those same days.

Answer:

Two sets of observations are paired if each observation in one set has a special correspondence or connection with exactly one observation in the other data set.

We might look at testing the difference of means using a two sample t-test. However, we may also try running a paired t-test. But its used in cases where the observations are usually from the same populations at different times or through different sources etc. Hence I will conclude its not paired.

  1. We randomly sample 50 items from Target stores and note the price for each. Then we visit Walmart and collect the price for each of those same 50 items.
Answer:

Yes, this is a paired data. The strong association of each record comes from a fact that each record is a price of the same item from different stores.

  1. A school board would like to determine whether there is a difference in average SAT scores for students at one high school versus another high school in the district. To check, they take a simple random sample of 100 students from each high school.
Answer:

This is again a case of testing the difference of means of two-samples (2 independent samples precisely) that are not paired.


5.24 Sample size and pairing.

Determine if the following statement is true or false, and if false, explain your reasoning: If comparing means of two groups with equal sample sizes, always use a paired test.

Answer:

“Always” is a concern here. It is suggested to use a paired t-test only when the data is paired i.e. if each observation in one set has a special correspondence or connection with exactly one observation in the other data se


5.30 Diamonds, Part II.

In Exercise 5.28, we discussed diamond prices (standardized by weight) for diamonds with weights 0.99 carats and 1 carat. See the table for summary statistics, and then construct a 95% confidence interval for the average difference between the standardized prices of 0.99 and 1 carat diamonds. You may assume the conditions for inference are met. x|0.99 carats | 1 carat -|———–|——– Mean(xbar)| $ 44.51 | $ 56.81 SD(s) | $ 13.32 | $ 16.13 n | 23 | 23

Mapping the p-line to z-line: For 95% confidence interval, z_at_alpha = 1.96
CI(0.99 carats): [ mean - z_at_alpha * SD , mean - z_at_alpha * SD ] == [ 18.4, 70.6 ]
CI( 1 carats): [ mean - z_at_alpha * SD , mean - z_at_alpha * SD ] == [ 25.19, 88.42 ]

Using SE:
CI(0.99 carats): [ mean - (z_at_alpha * SD/sqrt(n)) , mean - (z_at_alpha * SD/sqrt(n)) ] == [ 39.06, 49.95 ]
CI( 1 carats): [ mean - (z_at_alpha * SD/sqrt(n)) , mean - (z_at_alpha * SD/sqrt(n)) ] == [ 50.22, 63.4 ]


5.36 Gaming and distracted eating, Part II.

The researchers from Exercise 5.35 also investigated the effects of being distracted by a game on how much people eat. The 22 patients in the treatment group who ate their lunch while playing solitaire were asked to do a serial-order recall of the food lunch items they ate. The average number of items recalled by the patients in this group was 4.9, with a standard deviation of 1.8. The average number of items recalled by the patients in the control group (no distraction) was 6.1, with a standard deviation of 1.8. Do these data provide strong evidence that the average number of food items recalled by the patients in the treatment and control groups are different?


5.42 Which test?

We would like to test if students who are in the social sciences, natural sciences, arts and humanities, and other fields spend the same amount of time studying for this course. What type of test should we use? Explain your reasoning.

Answer:

Given that we are looking at multiple discrete variables we may look at ANOVA or chi square test to examine the difference of mean of various subjects.


5.48 Work hours and education.

The General Social Survey collects data on demographics, education, and work, among many other characteristics of US residents.47 Using ANOVA, we can consider educational attainment levels for all 1,172 respondents at once. Below are the distributions of hours worked by educational attainment and relevant summary statistics that will be helpful in carrying out this analysis. Educational attainment Less than HS HS Jr Coll Bachelor’s Graduate Total Mean 38.67 39.6 41.39 42.55 40.85 40.45 SD 15.81 14.97 18.1 13.62 15.51 15.17 n 121 546 97 253 155 1,172 (a) Write hypotheses for evaluating whether the average number of hours worked varies across the five groups. (b) Check conditions and describe any assumptions you must make to proceed with the test. (c) Below is part of the output associated with this test. Fill in the empty cells. Df Sum Sq Mean Sq F value Pr(>F) degree XXXXX XXXXX 501.54 XXXXX 0.0682 Residuals XXXXX 267,382 XXXXX Total XXXXX XXXXX (d) What is the conclusion of the test? ***