#P1. Crop plants Crop plants are usually prone to attack by root aphids. One way to protect crops is the pre-treat the plant roots with an insecticide. A random sample of 11 pre-treated roots measured for the active concentration of the insecticide gives 8.2% as the sample mean. Assume population standard deviation is 2%. Obtain a 95% confidence interval for the mean concentration of the insecticide in all pre-treated roots.

##Solution The sample size is 11 (pre-treated roots). The sample mean is 0.082 and the population standard deviation is 0.02. Because the standard deviation is known and the sample size is less than 30 we can use Z-interval. The z criterion for a 95% confidence is 1.960. To calculate the confidence interval we would substitute in the formula.

n_ro=11 #sample size
x_bar_ro=0.082#sample mean
sigma_ro=0.02 #standard deviation for the population
se_ro = sigma_ro/sqrt(n_ro) #standard error 
M.E95_ro <- 1.96*se_ro #margin of error 95% 
lower_ro= x_bar_ro - M.E95_ro #lower confidence interval
upper_ro= x_bar_ro + M.E95_ro #upper confidence interval

cat("95% Confidence Interval: (", round(lower_ro, 4), ",", round(upper_ro, 4), ")\n")
## 95% Confidence Interval: ( 0.0702 , 0.0938 )

##Conclusions Based on the results I can conclude that the 95% confidence interval for the mean concentration of the insecticide in all pre-treated roots is between 0.0702 (or 7.02%) and 0.0938 (or 9.38%).

#P2. Student scores It is known that student scores are normally distributed with a standard deviation of 70, how large a random sample is needed if we want to obtain a 95% confidence interval for the student mean score within 5 points of the sample mean?

##Solution The problem provides the standard deviation and that the scores are normally distributed. zα/2 is the critical value for a 95% confidence level (from standard normal tables the value is 1.960). The margin error is also given. I need to calculate the required sample size by clearing the confidence interval equation for n (sample size).

sigma_st=70 #Population standard deviation
Margin_e_st= 5 #Margin of error
z_alpha_st= qnorm(0.975) #1.96, z critical value

n_st <- (z_alpha_st * sigma_st / Margin_e_st) ^ 2 #compute what's the required sample size
n_st
## [1] 752.9259
ssr=ceiling(n_st) #round up

cat("Required sample size:", ssr, "\n")
## Required sample size: 753

##Conclusions

Based on the calculations the required (large) random sample size needed to obtain a 95% confidence interval for the student mean score within 5 points of the sample mean would be 753.

#P3. Lightbulbs

Fourteen successively tested light bulbs functioned for the following lengths of time (measured in hours): 35.6, 39.2, 18.4, 42, 45.3, 34.5, 27.9, 24.4, 19.9, 40.1, 37.2, 32.9, 33.1, 43.4

  1. Determine a 95% confidence interval estimate of the mean life of a light bulb
  2. A claim has been made that the results of this experiment indicate that “One can be 99% certain that the mean life exceeds 30 hours”. Do you agree with this statement?

##Solution

The data set and the sample size are given. The standard deviation and mean are not given, I have to calculate it (use t). Because the sample size is small (less than 30) I can use t to determine the 95% and 99% confidence intervals of the mean life of a light bulb. Since the population For part b we can perform a one tailed t-test.

Solution (a)

time_lb= c(35.6, 39.2, 18.4, 42, 45.3, 34.5, 27.9, 24.4, 19.9, 40.1, 37.2, 32.9, 33.1, 43.4) #sample
n_lb=14 #sample size
xbar_lb=mean(time_lb) #Sample mean
sd_lb=sd(time_lb) #sample standard deviation
#degrees of freedom = 14-1=13

#For the case alpha=0.05 #Check the table
t_13_0.025= qt(0.025, 13) 
t_13_0.025
## [1] -2.160369
qt(1-0.05/2,13) #t-critical for 95% confidence interval
## [1] 2.160369
#Calculating the quantile
lower_lb= xbar_lb - t_13_0.025 * sd_lb/sqrt(n_lb) #calculating the lower bound (confidence interval)
upper_lb= xbar_lb + t_13_0.025 * sd_lb/sqrt(n_lb) #calculating the upper bound (confidence interval)

t.test(time_lb, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  time_lb
## t = 14.97, df = 13, p-value = 1.415e-09
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  28.96491 38.73509
## sample estimates:
## mean of x 
##     33.85
cat("lower bound= ", lower_lb, "upper bound =", upper_lb)
## lower bound=  38.73509 upper bound = 28.96491

Conclusion (a)

Based on the results, we can conclude that the true mean lifespan of light bulbs is between 28.96 (29) and 38.74 (39) hours with a 95% confidence.

  1. A claim has been made that the results of this experiment indicate that “One can be 99% certain that the mean life exceeds 30 hours”. Do you agree with this statement?

##Solution (b)

For part b we can perform a one tailed t-test. (99% confidence interval) H0: The light bulbs mean life is 30 hours (\(\mu = 30\)) Ha: The light bulbs mean life exceeds 30 hours (\(\mu > 30\))

mu_h0 = 30  # mean life
t_stat_lb = (xbar_lb - mu_h0) / (sd_lb / sqrt(n_lb))  # Compute t-score
p_value_lb = 1 - pt(t_stat_lb, 13)  # One-tailed test

cat("Test Statistic (t):", round(t_stat_lb, 3), "\n")
## Test Statistic (t): 1.703
cat("p-value:", round(p_value_lb, 4), "\n")
## p-value: 0.0562
alpha_99_lb = 0.01  # Significance level
if (p_value_lb < alpha_99_lb) {
  cat("Conclusion: We reject H0. There is strong evidence that the mean life exceeds 30 hours.\n")
} else {
  cat("Conclusion: We fail to reject H0. This claim is not proven at 99% confidence.\n")
}
## Conclusion: We fail to reject H0. This claim is not proven at 99% confidence.

##Conclusion (b)

Because the value for p (0.0562) is higher than 0.01 (value/threshold for a 99% confidence interval) we can’t accept Ha and we failed to reject H0. Therefore, there is not enough evidence at a 99% confidence interval that the mean life of light bulbs exceeds 30 hours. Based on the calculations I can’t agree with the statement provided.

#P4. Heart Attack Victims

It is felt that first-time heart attack victims are particularly vulnerable to additional heart attacks during the year following the first attack. To estimate the proportion of victims who suffer an additional attack within 1 year, a random sample of 300 recent heart attack patients was tracked for 1 year. (a) If 35 of them suffered an attack within this year, give a 95% confidence interval estimate of the desired proportion. (b) Repeat (a) by assuming 75 suffered an attack within the year.

##Solution

The sample size is given = 300 heart attack patients. The patients were tracked for a year. 2 number of successes/failures were given 35 and 75. The sample mean and standard deviation are not given, needs calculation. We should calculate the sample proportion (p-hat) and use the z score for the 95% confidence interval (1.960) to calculate the standard error and the upper and lower bounds (of the confidence interval).

#Solution part a
n_ha=300 #sample size
x_ha= 35 #patients that had a second heart attack 
p_hat_ha=x_ha/n_ha #proportion of sample
z_ha=1.960 #z score for 95% confidence interval

se_ha= z_ha*sqrt(p_hat_ha*(1-p_hat_ha)/n_ha) #calculating standard error
lower_ha = p_hat_ha - se_ha 
upper_ha = p_hat_ha + se_ha

cat("95% Confidence Interval: (", round(lower_ha, 4), ",", round(upper_ha, 4), ")\n")
## 95% Confidence Interval: ( 0.0803 , 0.153 )
#Solution part b
n_ha=300 #sample size
x_ha_b= 75 #patients that had a second heart attack 
p_hat_ha_b=x_ha_b/n_ha #proportion of sample
z_ha_b=1.960 #z score for 95% confidence interval

se_ha_b= z_ha_b*sqrt(p_hat_ha_b*(1-p_hat_ha_b)/n_ha) #calculating margin error
lower_ha_b = p_hat_ha_b -  se_ha_b
upper_ha_b = p_hat_ha_b + se_ha_b

cat("95% Confidence Interval: (", round(lower_ha_b, 4), ",", round(upper_ha_b, 4), ")\n")
## 95% Confidence Interval: ( 0.201 , 0.299 )

If 35 of 300 patients that had a heart attack and had an additional heart attack within 1 year of having the first one, the 95% confidence interval for the desired proportion of patients at risk is between 0.080 and 0.153. If 75 of 300 patients were victims of an additional heart attack within 1 year of having the first one, the 95% confidence interval of patients at risk is between 0.201 and 0.299.

#P5. Studies of fishes in a lake

Previous studies of fish in a lake indicated that the mean polychlorinated biphenyl (PCB) concentration was 11.2 parts per million with a standard deviation of 2.5 parts per million. Suppose a new random sample of 11 fish has the following concentrations: 11.5 12 11.6 11.8 10.4 10.8 12.2 11.9 12.4 12.6 11.7

Assume that the standard deviation has remained equal to 2.5 parts per million. Test the hypothesis that the mean PCB concentration of the fish in the lake has also remained unchanged at 11.2 parts per million. Use the 5 percent level of significance.

##Solution

No population standard deviation was given, we have to use the standard deviation provided 2.5. In general we have to use T. Given the sample mean concentration PCB is 11.2, we intend to prove that the mean PCB concentration of the fish in the lake has also remained unchanged at 11.2 parts per million. This would correspond a two tailed test.

H0: The mean PCB concentration has remained unchanged 11.2 (\(\mu = 11.2\)) Ha: The mean PCB concentration has changed / is not equal to 11.2 (\(\mu \neq 11.2\)) (Could be higher or lower)

con= c(11.5, 12, 11.6, 11.8, 10.4, 10.8, 12.2, 11.9, 12.4, 12.6, 11.7)
n_fi=length(con) #sample size
sd_fi=2.5 #sigma,population standard deviation
x_fi=11.2 #population mean
x_bar_fi=mean(con) #sample mean

z_score_fi = (x_bar_fi - x_fi) / (sd_fi / sqrt(n_fi)) #Z test
z_score_fi
## [1] 0.6874459
p_value_fi = 2 * (1 - pnorm(abs(z_score_fi)))
p_value_fi
## [1] 0.4918018
# Output results

cat("Z-score:", round(z_score_fi, 4), "\n")
## Z-score: 0.6874
cat("p-value:", round(p_value_fi, 4), "\n")
## p-value: 0.4918
# Decision based on p-value
if (p_value_fi < 0.05) {
  cat("Conclusion: Reject H0. The mean PCB concentration has changed.\n")
} else {
  cat("Conclusion: Fail to reject H0. No significant evidence of change in PCB concentration.\n")
}
## Conclusion: Fail to reject H0. No significant evidence of change in PCB concentration.

Based on the calculation we can’t accept the Ha, we fail to reject the H0 (null hypothesis) the p value is different (bigger) from the significance level of 0.05 therefore the mean PCB concentration of the fish in the lake remains unchanged at 11.2ppm.

#P6. Production of tomatoes

A farmer claims to be able to produce larger tomatoes. To test this claim, a tomato variety that has a mean diameter size of 8.5 centimeters is used. If a sample of 19 tomatoes yielded a sample mean of 9.3 centimeters, and a sample standard deviation of 2.6, does this prove that the mean size is indeed larger? Use the 5 percent level of significance.

##Solution The excercise provides the population mean, the sample mean, sample size and standard deviation. It also provides the significance level (alpha=0.05). Because the population standard deviation is unknown we can use a t-test.

H0: The true mean diameter of tomatoes yielded is equal to 8.5cm (\(\mu = 8.5\)) Ha: The true mean diameter of tomatoes yielded is larger than 8.5 cm (\(\mu > 8.5\))

This test would be right tailed because we are checking if the true mean is indeed larger than 8.5cm. Because the sample size is small (less than 30) and the population standard deviation is not given we use t-test statistic.

x_bar_to=9.3 #sample mean
mu_0_to=8.5 #mean diameter (hypothesized/population mean)
sd_to=2.6 #standard deviation of sample
n_to=19 #sample size

t_to <- (x_bar_to - mu_0_to) / (sd_to / sqrt(n_to)) #t-score #t-stat
df_to=n_to - 1 #degrees of freedom

p_value_to <- 1 - pt(t_to, df_to)

cat("T-score:", round(t_to, 4), "\n")
## T-score: 1.3412
cat("p-value:", round(p_value_to, 4), "\n")
## p-value: 0.0983
# Decision based on p-value
if (p_value_to < 0.05) {
  cat("Conclusion: Reject H0. The farmer's tomatoes are significantly larger.\n")
} else {
  cat("Conclusion: Fail to reject the null hypothesis (H0). There is no significance that the tomatoes are larger.\n")
}
## Conclusion: Fail to reject the null hypothesis (H0). There is no significance that the tomatoes are larger.

The p-value is bigger than the significance level of 0.05 therefore we fail to reject the null hypothesis (H0). There is not enough statistical evidence at the 5% significance level to say that the farmers tomatoes are significantly larger than 8.5cm (the standard). Even though the sample mean is higher (9.3cm) the variability is high and we can’t confirm or accept the Ha as true.

#P7. Ambulance service

An ambulance service claims that at least 43 percent of its calls involve life-threatening emergencies. To check this claim, a random sample of 200 calls was selected from the service’s files. If 101 of these calls involved life-threatening emergency, is the service’s claim believable at the 5 percent?

##Solution

An ambulance service claims that AT LEAST 43% of the calls are life-threatening emergencies. The 43% is the given proportion. The sample size is 200 and the alpha (or significance level) is 0.05, the Z critical value for this alpha is -1.645. If the z-score is less than this value we can accept the Ha (reject H0). Because these are given we proceed to calculate the test statistic z (used for bigger sample size like this case). We have to proceed and calculate the p-value (left-tailed test).

H0: The proportion of life threatening emergency calls is (at least) 43% (\(p ≥ 0.43\)) Ha: The proportion of life threatening emergency calls is less than 43% (\(p < 0.43\))

prop_am=0.43 #proportion given
n_am=200 #sample size
x_am=101 #number of life threatening cases
p_hat_am= x_am/n_am #p hat or sample proportion

z_score_am=(p_hat_am - prop_am)/sqrt((prop_am* (1-prop_am))/ n_am) #calculating z score
z_score_am #2.14
## [1] 2.14242
p_value_am=pnorm(z_score_am) #p value, calculating pvalue (left tailed test)
p_value_am #0.98
## [1] 0.9839201
cat("Sample Proportion:", round(p_hat_am, 4), "\n") #results
## Sample Proportion: 0.505
cat("Z-score:", round(z_score_am, 4), "\n")
## Z-score: 2.1424
cat("p-value:", round(p_value_am, 4), "\n")
## p-value: 0.9839
if (p_value_am < 0.05) {
  cat("Conclusion: Reject H0. The service's claim is not believable at the 5%.\n")
} else {
  cat("Conclusion: Fail to reject H0. There is no significant evidence against the claim that 43% of the ambulance calls are life threatening .\n") #results
}
## Conclusion: Fail to reject H0. There is no significant evidence against the claim that 43% of the ambulance calls are life threatening .

##Conclusions

Based on the z score = 2.14 and the p value = 0.98 of the life threatening emergency calls these values are higher than the z critical value for alpha 0.05 which states that fail to reject the H0. There is no significant evidence to suggest that the actual proportion of life-threatening emergency calls is lower than 43%. At least 43% or more of the ambulance calls are actually life-threatening. Thus the ambulance service’s claim is believable.