DATA606 Homework-3

3.2 Area under the curve, Part II.

What percent of a standard normal distribution \(N(\mu =0,\quad \sigma =1)\) is found in each region? Be sure to draw a graph

a) Z > -1.13

Zlim <- 3
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- -1.13
xloc <- seq(Z, Zlim, 0.01)
yloc <- dnorm(xloc)
polygon(c(Z, xloc, Zlim), c(0, yloc, 0), col = "grey")

pnorm(Z, lower.tail = FALSE)

## [1] 0.8707619

b) Z < 0.18

Zlim <- 3
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- 0.18
xloc <- seq(-Zlim, Z, 0.01)
yloc <- dnorm(xloc)
polygon(c(-Zlim, xloc, Z), c(0, yloc, 0), col = "grey")

pnorm(Z)

## [1] 0.5714237

c) Z > 8

Zlim <- 8
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- 8
xloc <- seq(Z, Zlim, 0.01)
yloc <- dnorm(xloc)
polygon(c(Z, xloc, Zlim), c(0, yloc, 0), col = "grey")

pnorm(Z, lower.tail = FALSE)

## [1] 6.220961e-16

d) |Z| < 0.5

-0.5 < Z < 0.5

Zlim <- 3
x <- seq(-Zlim, Zlim, 0.01)
y <- dnorm(x)
plot(x, y, ylab = "Freq", type = "l")
Z <- 0.5
xloc <- seq(-Z, Z, 0.01)
yloc <- dnorm(xloc)
polygon(c(-Z, xloc, Z), c(0, yloc, 0), col = "grey")

pnorm(Z) - pnorm(-Z)

## [1] 0.3829249

3.4 Triathlon times, Part I.

(a)

The short-hands are: For men’s group, \(N(\mu = 4313,\quad \sigma = 583)\) For women’s group, \(N(\mu = 5261,\quad \sigma = 807)\)

(b)

For Leo: \(Z=\frac { x-\mu }{ \sigma } =\frac { 4948-4313 }{ 583 } = 1.0891938\)

For Mary: \(Z=\frac { x-\mu }{ \sigma } =\frac { 5513-5261 }{ 807 } = 0.3122677\)

These Z-scores tell us relatively how Leo and Mary did in their respective groups. Their Z-scores are both higher than their respective group means, meaning they are slower than at most people in their groups. Leo is 1.09 standard deviation higher than his group mean, while Mary is 0.31 higher than hers.

(c)

Mary did better, because the lower tail area represents the percentage of people finish better (faster, lesser time). Mary’s lower tail is smaller than Leo’s. Therefore, less people finish better than Mary in her group than Leo’s in his group.

(d)

(percent <- pnorm((4948-4313)/583, lower.tail = FALSE))

## [1] 0.1380342

Leo finished faster than 13.8% of people in his group.

(e)

(percent <- pnorm((5513-5261)/807, lower.tail = FALSE))

## [1] 0.3774186

Mary finished faster than 37.74% of people in her group.

(f)

Yes, becaues the entire analysis rest upon the assumption that the distribution is normal. If the distribution is something different, the same method cannot be used.

3.18 Heights of female college students.

hght <- c(54,55,56,56,57,58,58,59,60,60,60,61,61,62,62,63,63,63,64,65,65,67,67,69,73)

(a)

(hghtmu <- mean(hght))

## [1] 61.52

(hghtsd <- sd(hght))

## [1] 4.583667

For the 68% rule,

limitup <- hghtmu + hghtsd
limitdm <- hghtmu - hghtsd
interval <- hght[hght >= limitdm & hght <= limitup]
length(interval)/length(hght)

## [1] 0.68

For the 95% rule,

limitup <- hghtmu + 2*hghtsd
limitdm <- hghtmu - 2*hghtsd
interval <- hght[hght >= limitdm & hght <= limitup]
length(interval)/length(hght)

## [1] 0.96

For the 99.7% rule,

limitup <- hghtmu + 3*hghtsd
limitdm <- hghtmu - 3*hghtsd
interval <- hght[hght >= limitdm & hght <= limitup]
length(interval)/length(hght)

## [1] 1

Yes, the heights do follow the 68-95-99.7% rule.

(b)

Yes the data does appear to follow normal distribution. The histogram shows a unimodal bell curve approximately following the normal distribution curve. The Q-Q plot shows that majority of the observation fall on the normal distribution line, with the exception of few outliers.

To run qqnormsim, ensure the directory has “more/bdims.RData”.

load("more/bdims.RData")
qqnormsim(hght)

qqnorm(hght)
qqline(hght)

Comparing the true Q-Q plot with the simulated normal Q-Q plots, it is apparent that the true data is close to normal.

3.22 Defective rate.

(a)

0.98^9*0.02

## [1] 0.01667496

(b)

0.98^100

## [1] 0.1326196

(c)

The defective rate is 2%, or, 1/50. So on average, 49 good transistors before 1 defective is observed.

The standard deviation is:

sqrt(0.02*0.98)

## [1] 0.14

(d)

The defective rate is 5%, or, 1/20. So on average, 19 good transistors before 1 defective is observed.

The standard deviation is:

sqrt(0.05*0.95)

## [1] 0.2179449

(e)

Increasing the probability of success increase reduces the mean wait time until a sucess is observed, but increases the standard deviation of wait time.

3.38 Male children.

(a)

choose(3,2)*(0.51^2)*(0.49^1)

## [1] 0.382347

(b)

Three possible ordering: boy boy girl boy girl boy girl boy boy

0.51^2*0.49 + 0.51*0.49*0.51 + 0.49*0.51^2

## [1] 0.382347

Same result.

(c)

Using binomial model:

choose(8,3)*(0.51^3)*(0.49^5)

## [1] 0.2098355

But if we use Part(b) method, we have to list 56 possible orderings, and calculate probability of each to sum them up.

3.42 Serving in volleyball.

(a)

choose(9,2)*(0.15^3)*(0.85^7)

## [1] 0.03895012

(b)

If each serve is independent, previous results do not affect next serve. So the probability is still 15%, regardless how many success/failure she had previously.

(c)

Part (a) is a probability calculated before she made any serves. Once she started serving, the probability calculated in Part (a) does not hold anymore because uncertainty is removed by new observation. Part (b) will remain true always, as long as each serve is independent.

DATA606 Homework-3

Jun Yan

3.2 Area under the curve, Part II.

3.4 Triathlon times, Part I.

3.18 Heights of female college students.

3.22 Defective rate.

3.38 Male children.

3.42 Serving in volleyball.