Lab report

Load data:

download.file("http://www.openintro.org/stat/data/bdims.RData", destfile = "bdims.RData")
load("bdims.RData")

Set a seed:

Exercises:

Exercise 1:

mdims<-subset(bdims, sex == 1)
fdims<- subset(bdims, sex == 0)
hist(fdims$hgt, main = "Distribution of female height")

hist(mdims$hgt, main = "Distribution of male height")

Here are the two histograms, the distribution of male and female height, for the female height distribution it seems to have relatively normal distribution, with a slight bell-shaped curve.However, there is a higher distribution of heights at or under 165 cm, which causes a slightly right-skewed histogram. And the male height distribution histogram seems to have very normal distribution, with a bell-shaped curve.This histogram has a very slight skew to the left, since there are more heights over 180 cm, causing teh skew.

Exercise 2:

fhgtmean<-mean(fdims$hgt)
fhgtsd<-sd(fdims$hgt)
hist(fdims$hgt, probability = "TRUE")
x<-140:190
y<-dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col= "blue")

ylim = c(0, 0.06)

The data does seem to follow normal distribution since it has a very apparent bell-shaped curve, and the curve overlapping the data does not have an jumps or abrupt changes, which means it has normal distribution.

Exercise 3:

sim_norm<-rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
lines(x = x, y = y, col = "blue")

The points on this plot don’t exactly follow a straight line, but they are fairly close, with hardly any deviations. The largest deviations are at the tail and head of the distribution, but for the most part, the data is linear.

Exercise 4:

qqnorm(fdims$hgt)

The data in this plot has more deviations than the first plot, since the data seems to rise and fall often through the distribution. The data overall does seem to follow a straight line, but the distribution is significantly less normal than the simulated plot.

Exercise 5:

qqnorm(fdims$wgt)

The plot for female weight distribution seems to have a relatively normal distribution, as the data seems to mostly follow a straight line. But, towards the tail end of the distribution there are more deviations and upward movement of the data. So, the data appears relatively normal, but there are slight deviations towards the tail end of the distribution.

Exercise 6:

fwgtmean<-mean(fdims$wgt)
fwgtsd<-sd(fdims$wgt)
1-pnorm(q=152.4, mean = fhgtmean, sd = fhgtsd)
## [1] 0.971658
sum(fdims$hgt<152.4)/length(fdims$hgt)
## [1] 0.02692308
1-pnorm(q=200, mean = fwgtmean, sd = fwgtsd)
## [1] 0
sum(fdims$wgt>200)/length(fdims$wgt)
## [1] 0

What is the probability that a random female is shorter than 5ft (152.4cm)? Theoretical distribution:0.9717 Empirical distribution: 0.0269 What is the probability that a random female weighs more than 200 lbs? Theoretical distribution:0 Empirical distribution:0


On your own:

1:

1.1
  1. The histogram for female biiliac (pelvic) diameter (bii.di) belongs to normal probability plot letter B.
1.2
  1. The histogram for female elbow diameter (elb.di) belongs to normal probability plot letter C.
1.3
  1. The histogram for general age (age) belongs to normal probability plot letter D.
1.4
  1. The histogram for female chest depth (che.de) belongs to normal probability plot letter A.

2:

This step-wise pattern is most likely caused by the scale on which the data was measured. Since age is usually given in whole numbers, the data recorded at each age will be an entire integer, which causes the step-wise pattern along the y-axis.

3:

qqnorm(fdims$kne.di)

hist(fdims$kne.di)

The plot for female knee diameter does not have normal distribution, it has strong deviations in the right tail, suggesting the it is right skewed. This can be confirmed with a histogram, that shows the right skew much better.