Small Samples Inference

10.6 Is there a difference in the prices of tuna, depending on the method of packaging? Consumer Reports gives the estimated average price for a 6-ounce can or a 7.06-ounce pouch of tuna, based on prices paid nationally in sueprmarkets. These prices are recorded for a variety of different brands of tuna. Assume that the tuna brands included in this survey represent a random sample of all tuna brands available in the United States.

a. Find a 95% confidence interval for the average price for light tuna in water. Interpret this interval. That is, what does the “95%” refere to?

## [1] "Éste conjunto de datos tiene las siguientes características:  media: 0.896429, desviación estándar es: 0.399531 y grados de libertad de 13.000000"

Los valores críticos en un test de dos colas con \(\alpha=0.5\) y 13 grados de libertad son \(\pm 2.160 \therefore\) \[\mu \pm2.160\bigg(\frac{0.399}{\sqrt{14}}\bigg)\] El intervalo con un nivel de confianza de 95% es: 0.6660921 a 1.126765

Lo que significa que los valores que queden dentro del intervalo mencionado representan la media poblacional \(\mu\) adecuadamente en un 95%.

b. Find a 95% confidence interval for the average price for white tuna in oil. how does the width of this interval compare to the width of the interval in part a? Can you explain why?

## [1] "Para el conjunto de datos de este problema media = 1.225000, desviación estándar = 0.033166 y grados de libertad = 3.000000"

Un intervalo de confianza de dos colas con distribución \(t\) con \(\alpha = 0.05\) y 3 grados de libertad es \(\pm3.182 \therefore\)

\[\mu \pm3.182\bigg(\frac{0.0332}{\sqrt{4}}\bigg)\]

El intervalo de confianza con 95% de confianza y 3 grados de libertad se encuentra entre los valores 1.1722325 y 1.2777675

c. Find 95% confidence intervals for the other two samples (white tuna in water and light tuna in oil). Plot the four treatment means and their standard errors in a two-dimensional plot similar to Figure 8.5, What kind of broad comparisons can you make about the four treatments?

White tuna in Water

## [1] "White Tuna in Water, media = 1.280000, desviación estándar= 0.135119, grados de libertad: 7.000000."

Un intervalo de confianza de dos colas con distribución \(t\) con \(\alpha = 0.05\) y 7 grados de libertad es \(\pm2.365 \therefore\)

\[\mu \pm2.365\bigg(\frac{0.135}{\sqrt{8}}\bigg)\] El intervalo de confianza con 95% de confianza y 7 grados de libertad se encuentra entre los valores 1.1670197 y 1.3929803

Light Tuna in Oil

## [1] "Light Tuna in Oil, media = 1.147273, desviación estándar= 0.678544, grados de libertad: 10.000000."

Un intervalo de confianza de dos colas con distribución \(t\) con \(\alpha = 0.05\) y 10 grados de libertad es \(\pm2.228 \therefore\)

\[\mu \pm2.228\bigg(\frac{0.678}{\sqrt{11}}\bigg)\] El intervalo de confianza con 95% de confianza y 10 grados de libertad se encuentra entre los valores 0.6914491 y 1.6030963

10.15 These data are the weights (in pounds) of 27 packages of ground beef in a supermarket meat display:

a. interpret the accompanying MINITAB printouts for the one-sample test and estimation procedures:

El resultado mostrado en la imagen proviene de un test de dos colas de un conjunto de datos con \(n=27\), se indican los valores de la media y la desviación estándar (los datos tienden a estar cercanos entre sí), la prueba se realizó con un intervalo de confianza del 95% y un \(\alpha = 0.05\). Los valores mínimos y máximos para \(\mu\) son 0.9867 y 1.1178.

b. Verify the calculated values of t and the upper and lower confidence limits.

## [1] "Aplicando las fórmulas con R se obtienen los siguientes resultados: media: 1.052222, desviación estándar: 0.165653, valor de t: 1.640000"

Para corroborar los valores mínimo y máximo dentro del intervalo de confianza de 95% utilizamos la siguiente fórmula, usando 26 grados de libertad: \[\mu \pm t_{.025}\bigg(\frac{s}{\sqrt{n}}\bigg) = 1\pm 2.056 \bigg(\frac{0.166}{\sqrt{27}}\bigg)\]

## [1] "Los valores correspondientes a éste ejercicio con un intervalo de confianza de 95 por ciento y de dos colas son 0.986540 como mínimo y 1.117905 como máximo, que corresponden a lo mostrado por el output de MINITAB."

10.17 Refere to Excercise 10.16. Since \(n>30\), use the methods of Chapter 8 to create a large-sample 95% confidence interval for the average serum cholesterol level for L.A. County employees. Compare the two intervals.

col <- c(148,303,262,278,305,304,315,284,227,225,300,174,275,220,306,240,209,229,260,184,368,253,261,221,242,139,169,239,247,282,203,170,254,178,311,249,254,222,204,271,265,212,273,250,276,229,255,299,256,248)
meanCol <- mean(col)
sdCol <- sd(col)
nCol <- length(col)

\[\overline{x}\pm z_{\alpha/2}\bigg(\frac{s}{\sqrt{n}}\bigg)\]

\[246.96 \pm \bigg(1.960*\frac{46.82}{\sqrt{50}}\bigg)\] \[246.96 \pm 12.97784\]

El intervalo de confianza con los valores de colesterol es: 233.98216 y 259.93784.

10.24 The MS Excel printout shows a test for the difference in two population means.

a. Do the two sample variances indicate that the assumption of a common population variance is reasonable? Las varianzas son marcadamente distintas, considero que no fortalecen el asumir que una varianza común para la población es razonable.

b. What is the observed value of the test statistic? If this is a two-tailed test, what is the p-value associated with the test?

El valor observado del estadístico es \(t=0.365\), siendo un test de dos colas se está utilizando un intervalo de confianza del 95% con 11 grados de libertad. Con un valor \(p=0.05\).

c. What is the pooled estimate \(s^2\) of the population variance?

Pooled estimate of the population variance: \(s^2=3.524\)

d. Use the information in the printout to construct a 95% confidence interval for the difference in the population means. Does this interval confirm your conclusions in part d?

\[(\mu_2-\mu_1) \pm 2.201\bigg(\frac{\sqrt{3.524}}{\sqrt{12}}\bigg)\]

\[-0.381 \pm 2.201*0.5419102\] El intervalo de confianza se encuentra entre los valores -1.5737444 y 0.8117444.

10.29 A geologist collected 20 different ore samples, all of the same weigt, and randomly divided them in to two groups. The titanium contents of the samples, found using two different methods, are listed in the table:

a. Use an appropriate method to test for a significant difference in the average titanium contents using the two different methods. Éste problema se presenta para ser resuelto por una aproximación de diferencia de medias entre poblaciones pareadas (el método 1 y método 2 están relacionados) así como un intervalo de confianza con 2 colas. Dado que el cociente entre varianzas de los dos métodos es igual a ``

Por lo que se procede con el siguiente análisis:

method1<-c(.011,.013,.013,.010,.013,.013,.015,.011,.014,.012)
method2<-c(.011,.016,.013,.012,.015,.012,.017,.013,.014,.015)
result <- t.test(method1, method2, paired = FALSE, alternative = "two.sided")
result

## 
##  Welch Two Sample t-test
## 
## data:  method1 and method2
## t = -1.6767, df = 17.003, p-value = 0.1119
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.0029357463  0.0003357463
## sample estimates:
## mean of x mean of y 
##    0.0125    0.0138

El resultado ofrece evidencia para NO rechazar la hipótesis nula y decir que la diferencia en las medias es igual. Los datos no son estadísticamente significativos.

b. Determine a 95% confidence interval estimate for \((\mu_1-\mu_2)\). Does your interval estimate substantiate your conclusion in part a Explain.

El intervalo de confianza ya está provisto por el sofwtare y va de -0.0029357 a 3.357462910^{-4}.

Para éste conjunto de datos t = -1.676741 ubicándose por arriba del punto de corte para el intervalo de confianza para la cola inferior de la gráfica, por lo tanto se concluye en NO rechazar \(H_0\), arribando a la misma conclusión que en el inciso anterior.

10.42 An experiment was conducted to compare the mean reaction times to two types of traffic signs: prohibitive (No Left Turn) and permissive (Left Turn Only). Ten drivers were included in the experiment. Each driver was presented with 40 traffic signs, 20 inhibitive and 20 permissive, in random order. The mean time to reaction (in miliseconds) was recorded for each driver and is shown here.

a. Explain why this is a paired-difference experiment and give reasons why the pairing should be useful in increasing information on the difference between the mean reaction times to prohibitive and permissive traffic signs.

b. Use the Excel printout to determine wether there is a significant difference in mean reaction times to prohibitive and permissive traffic signs. Use the p-value approach.

10.45 To test the comparative brightness of two red dyes, nine samples of cloth were taken from a production line and each sample was divided into two pieces. One of the two pieces in each sample was randomly chosen and red dye 1 applied; red dye 2 was applied to the remaining piece. The following data represent a “brightness score” for each piece. Is there sufficient evidence to indicate a difference in mean brightness scores for the two dyes? Use \(\alpha=.05\).

dye1 <- c(10,12,9,8,15,12,9,10,15)
dye2 <- c(8,11,10,6,12,13,9,8,13)
t.test(dye1,dye2,paired = FALSE, alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  dye1 and dye2
## t = 0.93865, df = 15.963, p-value = 0.3619
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.398779  3.621002
## sample estimates:
## mean of x mean of y 
##  11.11111  10.00000

Dados los resultados de éste análisis NO se rechaza \(H_0\).

10.51 A random sample of size \(n=7\) from a normal population produced these measurements: 1.4,3.6,1.7,2.0,3.3,2.8,2.9.

a. Calculate the sample variance, \(s^2\).

\[s^2 = \frac{\sum(x_i-\overline{x})^2}{n-1}\]

sample <- c(1.4,3.6,1.7,2.0,3.3,2.8,2.9)
var(sample)

## [1] 0.6990476

b. Construct a 95% confidence interval for the population variance, \(\sigma^2\)

\[\frac{(n-1)s^2}{\chi^2_{\alpha/2}} < \sigma^2 < \frac{(n-1)s^2}{\chi^2_{1-\alpha/2}}\] Donde: \[\chi^2=\frac{(n-1)s^2}{\sigma^2}\] Los valores en la distribución \(\chi^2\) para un \(\alpha=0.5\) son:

qchisq(c(.025,.975),df=6, lower.tail=TRUE)

## [1]  1.237344 14.449375

Por lo tanto el intervalo de confianza resultante es:

c(
((7-1)*sd(sample)^2)/qchisq(c(.025),df=6, lower.tail=FALSE),
((7-1)*sd(sample)^2)/qchisq(c(.975),df=6, lower.tail=FALSE))

## [1] 0.2902745 3.3897484

c. Test \(H_0: \sigma^2=0.8\) versus \(H_1: \sigma^2 \neq 0.8\) using \(\alpha=0.5\). State your conclusions. Puntos críticos:

\(H_0: \sigma^2=0.8\)
\(H_a: \sigma^2 \neq0.8\), usando \(\alpha=0.05\)
Calculando el valor de \(\chi^2\):

\[\chi^2=\frac{6*0.699}{0.8}\]

6*var(sample)/.8

## [1] 5.242857

Dado el resultado del parámetro de interes se toma la decisión de NO rechazar \(H_0\).

d. What is the approximate p-value for the test in part c?

pchisq(5.2425, df = 6, lower.tail = FALSE)

## [1] 0.5131081

10.61 10.63 Quarterbacks not only need to have a good passing percentage, but they need to be consistent. That is, the variability in the number of passes completed per game should be small The table below gives the number of pases completed for Ben Roethilsbervger and Aaron Rogers, quarterbacks for the Steelers and Packers, during 2010 NFL season.

a. Does the data indicate that there is a difference in the variability in the number of passes completed for the two quarterbacks? Use \(alpha=0.01\)

br <- c(19,19,34,12,27,18,21,15,27,22,26,21,7,25,19)
ar <- c(16,19,17,17,30,18,20,22,21,23,22,15)
meanBr <- mean(br)
meanAr <- mean(ar)
varBr <- var(br)
varAr <- var(ar)
var.test(br,ar,alternative = "greater")

## 
##  F test to compare two variances
## 
## data:  br and ar
## F = 2.6611, num df = 14, denom df = 11, p-value = 0.05483
## alternative hypothesis: true ratio of variances is greater than 1
## 95 percent confidence interval:
##  0.9716719       Inf
## sample estimates:
## ratio of variances 
##           2.661068

El cosiente de las varianzas es 2.6610675. Por lo que podemos avanzar asumiendo varianzas similares.

b. If you were going to test for a difference in the two population means, would it be appropriate to use the two-sample t-test that assumes equal variances? Explain

Dado el resultado en la prueba F, podemos concluir que no hay evidencia significativa de que haya una diferencia entre las varianzas de los dos conjuntos de datos.

Small Samples Inference

Gener J Avilés Rodríguez

November 13, 2018

10.15 These data are the weights (in pounds) of 27 packages of ground beef in a supermarket meat display:

10.17 Refere to Excercise 10.16. Since \(n>30\), use the methods of Chapter 8 to create a large-sample 95% confidence interval for the average serum cholesterol level for L.A. County employees. Compare the two intervals.

10.24 The MS Excel printout shows a test for the difference in two population means.

10.29 A geologist collected 20 different ore samples, all of the same weigt, and randomly divided them in to two groups. The titanium contents of the samples, found using two different methods, are listed in the table:

10.51 A random sample of size \(n=7\) from a normal population produced these measurements: 1.4,3.6,1.7,2.0,3.3,2.8,2.9.

10.61