library(DATA606)
##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
##
## Attaching package: 'DATA606'
## The following object is masked from 'package:utils':
##
## demo
library(ggplot2)
load("more/bdims.RData")
mdims <- subset(bdims, sex == 1)
fdims <- subset(bdims, sex == 0)
ggplot(bdims) + geom_histogram(mapping = aes(x = hgt, fill = sex),color = "grey", binwidth = 3) + facet_wrap(~sex)
The histograms are approximately normal. The female plot appears to be a touch more symmetric. The male plot looks to exhibit a slight leftward skew, with the bulk of the density to the right of the mode. They are centered around different means, with the maleβs being higher. The males appear to have a larger variance as well because the range is larger.
This plot is mostly normal with a slight right skew. It exhibits the right skew shape with bulk of the density is to the left of the mode and a somewhat heavy right tail. I would say itβs approximately normal because the normal line intersects the top of each bar. This means that each bin isnβt too far off from what would be estimated by a normal distribution.
fdims <- subset(bdims, sex == 0)
fhgtmean <- mean(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
fwgtmean <- mean(fdims$wgt)
fwgtsd <- sd(fdims$wgt)
hist(fdims$hgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
With the plot of the simulated data, we also see more errant points with larger deviations towards the tails. It appears you canβt base normality too much with what goes on at the tails until the sample size becomes very large.
qqnormsim(fdims$hgt)
A couple of these plots of the simulated normal data are virtually indistinguishable from the actual plot. I think we can safely conclude that female heights are appoximately normal. Or maybe more accurately that we can not say definitevely that they are not normal.
qqnormsim(fdims$wgt)
Weight appears to exhibit more right skew than than the simulated data. There are much larger deviations past the second quantile. This variable could maybe be better modeled as lognormal to account for the skew.
What is the probability that female height falls between approximately 5β6ββand 6β, inclusive?
pnorm(182, mean = fhgtmean, sd = fhgtsd) - pnorm(167, mean = fhgtmean, sd= fhgtsd)
## [1] 0.3681159
sum(fdims$hgt <= 182 & fdims$hgt >= 167)/length(fdims$hgt)
## [1] 0.3923077
What is the probability that female weight falls between 60 and 85 kilos, inclusive?
pnorm(85, mean = fwgtmean, sd = fwgtsd) - pnorm(60, mean = fwgtmean, sd = fwgtsd)
## [1] 0.5193102
sum(fdims$wgt <= 85 & fdims$wgt >= 60)/length(fdims$wgt)
## [1] 0.4384615
Female height appears much close to the theoretical normal by this measure. This coinincides with what we concluded from the previous excercise
a)Plot B Slight left skew
b) Plot C Mostly normal
c) Plot D Heavy right skew
d) Plot A Slight right skew
The data was likely rounded to the nearest whole number
qqnorm(fdims$kne.di)
qqline(fdims$kne.di)
ggplot(fdims) + geom_histogram(mapping = aes(x= kne.di), fill = "blue", color = "gray", binwidth = .5)
mean(fdims$kne.di)
## [1] 18.09692
median(fdims$kne.di)
## [1] 18
Based on the QQ plot, knee diameter appears right skewed based on its U shape above the line. This is confirmed by the histogram with the heavy right tail, and by the fact that mean is slightly higher than median.