Unimodal symmetric distribution
library(psych)
mdims = subset(bdims, sex==1)
fdims = subset(bdims, sex==0)
hist(mdims$hgt)
hist(fdims$hgt)
describe(fdims$hgt)
## vars n mean sd median trimmed mad min max range skew
## X1 1 260 164.87 6.54 164.5 164.84 6.67 147.2 182.9 35.7 0.07
## kurtosis se
## X1 -0.32 0.41
describe(mdims$hgt)
## vars n mean sd median trimmed mad min max range skew
## X1 1 247 177.75 7.18 177.8 177.67 7.41 157.2 198.1 40.9 0.1
## kurtosis se
## X1 -0.16 0.46
fhgtmean <- mean(fdims$hgt)
fhgtsd <- sd(fdims$hgt)
hist(fdims$hgt, probability = TRUE)
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")
Yes, the more closely the bars fill the blue curve, the more normal the distribution.
The result vary everytime I run the normal probability plot, however the outcomes are all very close to the line, and compare simarily to the real data.
qqnorm(fdims$hgt)
qqline(fdims$hgt)
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
qqnormsim(fdims$hgt)
Yes, all the simulated plots follow very closely to the normal probability plot for fdims$hgt. The plots provide evidence that female heights are near normal distribution.
For the most part, women’s weight seems to follow a normal distribution. However as we go into higher quartiles the variance becomes much greater.
fwgtmean <- mean(fdims$wgt)
fwgtsd <- sd(fdims$wgt)
hist(fdims$wgt, probability = TRUE)
x <- 40:100
y <- dnorm(x = x, mean = fwgtmean, sd = fwgtsd)
lines(x = x, y = y, col = "blue")
qqnorm(fdims$wgt)
qqline(fdims$wgt)
sim_norm <- rnorm(n = length(fdims$wgt), mean = fwgtmean, sd = fwgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
qqnormsim(fdims$wgt)
1 - pnorm(q = 182, mean = fhgtmean, sd = fhgtsd)
## [1] 0.004434387
sum(fdims$hgt > 182) / length(fdims$hgt)
## [1] 0.003846154
The women’s height had a closer agreement between the two methods.
Question 1: What is the probability that a randomly chosen young adult female is shorter than 5 feet? (about 152cm))
pnorm(q=152,mean= fhgtmean, sd=fhgtsd) # Theoretical prob
## [1] 0.02459975
sum(fdims$hgt < 152 )/length(fdims$hgt) # empirical prob
## [1] 0.01923077
Question 2: What is the probability that a randomly chosen young adult female is weighs more than 60lbs?
1-pnorm(q=60,mean=fwgtmean,sd=fwgtsd) # Theoretical prob
## [1] 0.524893
sum(fdims$wgt > 60)/length(fdims$wgt) # empirical prob
## [1] 0.4384615
This happens when the distribution is not normal. The increased amount of outliers skew the results and distorts the beginning and ends of a QQ plot.
Variable is right skewed. As seen below.
femKneeMean = mean(fdims$kne.di)
femKneeSd = sd(fdims$kne.di)
hist(fdims$kne.di, probability = TRUE)
x <- 10:30
y <- dnorm(x = x, mean = femKneeMean, sd = femKneeSd)
lines(x = x, y = y, col = "red")
qqnorm(fdims$kne.di)
qqline(fdims$kne.di)
sim_norm <- rnorm(n = length(fdims$kne.di), mean = femKneeMean, sd = femKneeSd)
qqnorm(sim_norm)
qqline(sim_norm)
qqnormsim(fdims$kne.di)