Stats Lab 4

babyData <- read.csv('http://people.hsc.edu/faculty-staff/blins/classes/spring17/math222/data/babies.csv')

Question 1

Here is the text for the first question.

Create a histogram to explore the distribution of height, and describe the distribution. Make sure to label your axes!

hist(babyData$height, col = "blue", main = "Mother's Height Distribution", xlab = "Height (in)")

Question 2

Here is the text for the second question.

Create a summary of height. Are the mean and median different, or are they roughly the same?

summary(babyData$height)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   53.00   62.00   64.00   64.05   66.00   72.00      22

According to the data summary, the mean (64.05) is roughly the same as the median (64.00) which indicates no significant skew in the data.

Question 3

Here is the text for the third question.

Normal distribution must be symmetric and unimodal. Based on what we have done so far, does anything suggest this distribution might not be a normal distribution?

Given that the histogram shows a unimodal and roughly symmetric distribution and that the mean and the median are approximately the same, there is no real indication that this distribution is not normal and skewed.

Question 4

Here is the text for the fourth question.

If we did use a normal distribution to describe the height of mothers in these data, what value would you suggest we use 𝜇?

Given the height of the mothers from the study follows a normal distribution, I would suggest that we use 64.05 as the mean. Mean is most appropriate measure of center here because the distribution is normal with no notable outliers.

Question 5

Here is the text for the fifth question.

If we did use a normal distribution to describe the height of mothers in these data, what value would you suggest we use 𝜎?

sd(babyData$height, na.rm = TRUE)

## [1] 2.533409

For standard deviation, I would suggest that we use 2.53 as our 𝜎 value.

Question 6

Here is the text for the sixth question.

Based on Questions 4 and 5, write down the normal distribution you would use to describe the heights of mother’s in this data set.

N(𝜇,𝜎) => N(64.05, 2.53)

Question 7

Here is the text for the seventh question.

Create the plot using the three steps of code above. Does it look like the normal distribution (the curve) is a good fit for the histogram? In other words, does it look like the curve matches with the shape of the histogram?

# Step 1: Specify the range of the x axis 
range <- seq(from = 53, to = 72, length = 40)

#Step 2
fun <- dnorm(range, mean = 64.05, sd = 2.53)

#Step 3: Histogram 
hist(babyData$height, prob = TRUE, col = "white",
     ylim = c(0, max(fun)),
     main = "Histogram with normal curve",
     xlab = "Height (in inches)")
lines(range, fun, col = 2, lwd = 2)

It would appear as though the normal distribution is in fact a good fit for the histogram because the curve looks like it matches well with the shape of the distribution.

Question 8

Here is the text for the eighth question.

Based on the normal distribution we have chosen, what is the approximate probability that a mother is less than 69 inches (5 foot 9)

Given that 69 inches is approximately two standard deviations (2 * 2.53) away from the mean (64.05), there is approximately a 95% chance that a mother is shorter than 69 inches (5’9”).

Question 9

Here is the text for the ninth question.

Create a histogram to explore the distribution of birth weight, and describe the distribution. Make sure to label your axes!

hist(babyData$bwt, col = "darkgreen", xlab = "Birth Weight (oz)", main = "Baby's Birth Weight Distribution")

The distribution is unimodal and visually symmetrical; both are encouraging indicators for a normal distribution.

Question 10

Here is the text for the tenth question.

Use the code from above Question 7 to draw the appropriate normal curve on the histogram. State the parameters (mean and standard deviation) of the normal distribution you chose.

summary(babyData$bwt)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    55.0   108.8   120.0   119.6   131.0   176.0

sd(babyData$bwt)

## [1] 18.23645

# Step 1: Specify the range of the x axis 
range <- seq(from = 55, to = 176, length = 40)

#Step 2
fun <- dnorm(range, mean = 119.6, sd = 18.24)

#Step 3: Histogram 
hist(babyData$bwt, prob = TRUE, col = "white",
     ylim = c(0, max(fun)),
     main = "Histogram with normal curve",
     xlab = "Birth Weight (in ounces)")
lines(range, fun, col = 2, lwd = 2)

N(119.6, 18.24)

Question 11

Here is the text for the eleventh question.

Does it look like this normal distribution might be appropriate to birthweight? Explain why or why not.

I would definitely say so because the shape of the normal distribution closely resembles the unimodal histogram. In addition, the peak of the normal distribution, representing the average of the distribution, lies right at the peak of the histogram and due to the histogram being symmetrical, this indicates that the mean of the normal distribution aligns with the actual mean of the distribution from the histogram.

Question 12

Here is the text for the twelfth question.

What is the probability that a baby weighs less than 64.3 (approximately) ounces at birth?

Given that the mean of the birth weight distribution is 119.6 and 64.3 falls more than two standard deviations from the mean, the probability of a newborn weighing less than 64.3 ounces is approximately less than 2.5% due to 95% of the distribution weighing within 2 standard deviations of the mean.

Question 13

Here is the text for the thirteenth question.

What is the probability that a baby has a weight more than 1 standard deviation above the mean?

With 68% of the sample distribution having a birthweight within 1 standard deviation of the mean, 1 standard deviation above the mean of 119.6 ounces would represent roughly the 84th percentile (50 + (68/2) => 50 + 34 = 84) meaning that there is approximately a 16% probability that a newborn has a weight over 1 standard deviation above the mean.