Biostat HW 3

Question 1

Compare the critical value $(Z = ±1.96)$ that corresponds to 5% of the tail area of the standard normal distribution with the critical values of the $t$ distribution for $df = 9, 19, 29$, and $5000$. As the degree of freedom increases, what happens to the value of $t$ compared with the value of $Z$? Explain why this is occurring.

$ = 0.05 $

qt(p=0.975, df= c(9,19,29,500))

## [1] 2.262157 2.093024 2.045230 1.964720

As degree of freedom increases, the value of t approaches the value of z. The more degrees of freedom, the more the t distribution resembles a normal distribution. T distributions with low # of degrees of freedom have thicker tails. Higher DOF have thinner tails, more resembling those of a normal distribution.

Question 2

If samples of size $n = 25$ were selected from a normal population of (mean = 60, standard deviation = 10), what percentage of the sample means would you expect to be less than 58.

Given: \[ n=25, \mu= \mu_{\bar{x}}=60,\sigma=10 \\ X \sim N(\mu=60, \sigma^2=10^2)\]

Therefore, when

\[ \bar{X} \sim N\left(\mu_{\bar{x}}=60, \sigma_{\bar{x}}^{2}=\frac{10}{\sqrt{25}}\right) \]

Find:\[ P(\bar{x}<58) \]

q_2 <- pnorm(58,mean=60,sd= 10/sqrt(25))
q_2

## [1] 0.1586553

q_2 * 100 #For Percent

## [1] 15.86553

Therefore, the percentage of sample means we expect to be less than 58 is 15.87% given the stated sampling distribution and sample size $n=25$.

Question 3

Suppose systolic blood pressure of 17-year-old females is approximately normally distributed with a mean of 118 mmHg and a standard deviation of 12 mmHg.

What proportion of girls would you expect to have blood pressures between 112 mmHg and 124 mmHg?

\[ X \sim N(\mu=118,\ \sigma^2=12^2) \\ P (112<x<124) \]
```
pnorm(124,mean=118,sd=12) - pnorm(112,118,12)
```
```
## [1] 0.3829249
```
Therefore, 38.29% of girls are expected to have blood pressures between 112 mmHg and 124 mmHg
If you were to select a sample of 16 girls and obtain their mean systolic blood pressure, what proportion of such samples would you expect between 112 mmHg and 124 mmHg?

\[\bar{X} \sim N\left(\mu_{\bar{x}}=118, \sigma_{\bar{x}}^{2}=\frac{12^2}{16}\right) \\ P(112 < \bar{x} <124)\]

# Probability of BP < 124 mmHg
Z_Upper_BP <- pnorm(124,mean=118,sd=12/sqrt(16)) 

# Probability of BP < 112 
Z_Lower_BP <-pnorm(112,mean=118,sd=12/sqrt(16))

# Probability of BP > 112 and BP < 124 = Z_Upper_BP - Z_Lower_BP
Z_Difference_BP <- Z_Upper_BP - Z_Lower_BP
Z_Difference_BP

## [1] 0.9544997

Z_Difference_BP * 100 #For Percent

## [1] 95.44997

Therefore, 95.45% of sample means are expected to be between 112 mmHg and 124 mmHg.

Question 4: Sampling Distribution of Cigarettes Smoked

If the mean number of cigarettes smoked by pregnant women is 16 and the standard deviation is 8, find the probability that in a random sample of 100 pregnant women the mean number of cigarettes smoked will be greater than 24.

\[ \bar{X} \sim N\left(\mu_{\bar{x}}=16, \sigma_{\bar{x}}^{2}=\frac{8^2}{10}\right) \\ P(\bar{x} > 24) \\ 1-P(\bar{x} <24) \]

mean_cigs <- 1 - pnorm(24,16,sqrt(8^2/10))
mean_cigs

## [1] 0.0007827011

mean_cigs*100 #for percent

## [1] 0.07827011

Therefore, there is virtually a 0% probability that the mean number of cigarettes smoked by pregnant women is greater than 24.

An individual woman might smoke more than 24 cigarettes during pregnancy. However, if taking a random sample of 100 pregnant women, there is a minuscule possibility that the mean sample is more than 24.

Question 5

The standard hemoglobin reading for healthy adult men is $15g/100 ml$ with a standard deviation of $σ = 2 g/100 ml$. For a group (of size $n$) of men in a certain occupation, we find a mean hemoglobin of $16.0 g / 100 ml$.

Calculate the 95% confidence interval for the population mean, based on the following sample sizes: n=36, n=49, and n=64 (use the population standard deviation as $σ = 2 g/100 ml$)

\[X\sim N(\mu=15,\ \sigma^2=2^2)\] by CLT \[ \bar{X} \sim N\left(\mu_{\bar{x}}=15,\ \sigma^2=\frac{2^2}{n}\right) \\ n=36,\ 49, \ 64 \]

Using a 95% confidence interval, $\alpha=0.05,\ \ 1- \frac{\ \alpha}{2}=0.975$

Z_for_5 <- qnorm(0.975)
Z_for_5

## [1] 1.959964

\[ CI = \bar{x} ± Z_.975 \frac{\sigma}{\sqrt{n}} \\ CI = 16 ± 1.96 \frac{2}{\sqrt{n}} \] Sub in values for n

# n = 36
16 + 1.96*(2/ sqrt(36))

## [1] 16.65333

16 - 1.96*(2/ sqrt(36))

## [1] 15.34667

Therefore, when n = 36, the confidence interval is (15.34667, 16.65333)

It is with 95% certainty that the population mean will be between 15.34667 and 16.65333 when n = 36.

# n = 49
16 + 1.96*(2/ sqrt(49))

## [1] 16.56

16 - 1.96*(2/ sqrt(49))

## [1] 15.44

Therefore, when n =49, the confidence interval is (15.44, 16.56).

# n = 64
16 + 1.96*(2/ sqrt(64))

## [1] 16.49

16 - 1.96*(2/ sqrt(64))

## [1] 15.51

Therefore, when n = 64, the confidence interval is (15.51, 16.49).

As the sample size increase, do the confidence intervals shrink or widen? Explain

As Sample size increases, the confidence intervals shrink. In the CI formula (listed above), n is in the denominator. Therefore, there is an inverse relationship between CI width and n.

With a larger sample size, we can have the same amount of calculated confidence (95%) that the population mean will fall within a more narrow confidence interval: more data (n), more precision.

Question 6

The mean weight gain for a control diet of n_1 = 10 individuals is $\bar{X}_1 = 12.79$ and for a treatment diet of $n_2 = 9$ individuals is $\bar{X}_2 = 15.27$. The corresponding variances are $s_1^2 = 13.9$ and $s_2^2 = 12.8$. Suppose $μ_1$ and $μ_2$ indicate mean weight gains by the control diet and the treatment, respectively. Assume $X_1 \sim N(μ_1,σ^2)$ and $X_2 \sim N(μ_2,σ^2 )$. Compute the 95% confidence interval for $μ_1-μ_2$.

\[ n_1 =10,\ n_2=9 \\ \bar{x}_1 = 12.79,\ \bar{x}_2 =15.7 \\ s_1^2=13.9,\ s_2^2=12.8 \] Distributions \[ X\sim N(\mu, \sigma^2) \\ \bar{X} \sim N\left(\mu\_{\bar{x}},\frac{\sigma^2}{n}\right) \]

It is reasonable to believe the $\sigma_1 = \sigma_2$. Therefore, we can use a pooled variance estimate.

\[ s_p^2 = \frac{s_1^2(n_1-1)+s_2^2(n_2-1)}{n_1 + n_2-2} \\ s_p^2 = \frac{13.9(10-1)+12.8(9-1)}{10+9-2} \\ s_p^2 = 13.38 \] Pooled variance is 13.38. Now we can calculate the 95% confidence interval. \[ (\bar{x}_1 -\bar{x}_2) ± t_{.975, 17}* s_p * \sqrt{\frac{1}{n_1}+ \frac{1}{n_2}} \]

Typing this out is way harder than I expected. Please see my handwritten notes for questions 6,7. I did not attempt #8.

Question 7

The sample mean hemoglobin of $n_1 = 16$ Caucasian women is $\bar{X}_1 =13.7$, with $s_1^2=2.3$, and for $n_2 = 20$ African-American women is $\bar{X}_2 = 12.5$, with $s_2^2=2.1$. What are $90%, 95%, and 99%$ confidence intervals for $μ_1-μ_2$, the mean difference between Caucasian and African-American women’s hemoglobin? (here $μ_1$ and $μ_2$ indicate means hemoglobin levels among Caucasian and African-American women, respectively. Assume $X_1\sim N(μ_1,σ^2)$ and $X_2\sim N(μ_2,σ^2 )$)

Question 8

The probability density function of a continues random variable X is given by

\[ f(x)=\left\{\begin{array}{cc} 2 e^{-2 x} & x \geq 0 \\ 0 & x<0 \end{array}\right\} \]

Show that, $f(x)$ is truly a density function
Find the probability distribution function for the random variable $X$ (i.e., $F(X<x))$
Find the probability that the random variable exists between 0.5 to 1 region: calculate $P(0.5 < X < 1)$

Note that : In R, the exponent of a value “$a$” $(e^a)$ can be calculated as $exp(a)$ e.g., $exp(0.1) = 1.10517$