library(dplyr)
library(knitr)
load("more/bdims.RData")
mdims <- subset(bdims, sex == 1) %>% dplyr::select(wgt, hgt)
fdims <- subset(bdims, sex == 0) %>% dplyr::select(wgt, hgt)
- Male height: mean is 177.74 cm, standard deviation is 7.18 cm.
- Female height: mean is 164.87 cm, standard deviation is 6.54
- Both male and female histogram resembles a distribution that is normal or bell-shaped. Overall, both look symmetric and unimodal.
hist(mdims$hgt, xlab="Men's Height")
hist(fdims$hgt, xlab="Women's Height")
mhgtmean <- mean(mdims$hgt) #177.7453
mhgtsd <- sd(mdims$hgt) #7.183629
fhgtmean <- mean(fdims$hgt) #164.8723
fhgtsd <- sd(fdims$hgt) #6.544602
Yes. Based on the comparison of between the frequency history and density historgram, it appears that the data follow a nearly normal distribution.
hist(fdims$hgt, probability = TRUE, ylim = c(0, 0.06))
x <- 140:190
y <- dnorm(x = x, mean = fhgtmean, sd = fhgtsd)
lines(x = x, y = y, col = "blue")
sim_norm
. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data?The normal probability plot of the simulated do not fall all on the line. The data towards the tail end deviate more significantly from the line than compared to the points towards the center. As illustrated below, the normal probability plot of the actual data looks somewhat similar to the simulated one. The actual plot shows more deviation towards the middle.
sim_norm <- rnorm(n = length(fdims$hgt), mean = fhgtmean, sd = fhgtsd)
qqnorm(sim_norm)
qqline(sim_norm)
qqnorm(fdims$hgt)
qqline(fdims$hgt)
fdims$hgt
look similar to the plots created for the simulated data? That is, do plots provide evidence that the female heights are nearly normal?Below is several normal probability plots based on 9 simulations. Variability towards the tail ends seems to be a common feature shared by the other simulations. In this particular simuation, the one in row 2, column 3 shows variability towards the middle somewhat similar to the normal probability plot of the actual data. I’m not sure if this is “evidence” that the height of females follow a normal distribution (generalizable), but I can tell that for this particular data set, it is close to a normal distribution.
set.seed(41)
qqnormsim(fdims$hgt)
Based on the analysis below, I would say that female weights appear to come from a normal distribution. The distribution is skewed to the right. This data set may have some outliers with points that are above 90 kg. None of the simulations show points beyond 90 kg limit. So this may suggest that a few of the data points are outliers.
The normal distribution line overlayed on top of the frequency histogram does cover the highest point of the histogram. I’m not sure what the significance of this is. But it other than that, the blue line fits the histogram nicely.
fwgtmean = mean(fdims$wgt)
fwgtsd = sd(fdims$wgt)
hist(fdims$wgt, probability = TRUE, ylim = c(0, 0.06))
x <- 40:110
y <- dnorm(x = x, mean = fwgtmean, sd = fwgtsd)
lines(x = x, y = y, col = "blue")
qqnorm(fdims$wgt)
qqline(fdims$wgt)
I ran the the q-q plot simulation a few times while setting a different seed each time. In this particular case, the simulation located on row 1, column 2 looks somewhat similar to the probability normal plot of the actual data (female weight). None of the simulations so far I’ve seen generated points above the 100 kg mark. So far the cut off has been around 90 kg from what I’ve observed.
#sim_norm <- rnorm(n = length(fdims$wgt), mean = fwgtmean, sd = fwgtsd)
#qqnorm(sim_norm)
#qqline(sim_norm)
set.seed(91)
qqnormsim(fdims$wgt)
The theoretical probability that a female is less than 5ft is 2.8% assuming normal distribution. In the data set, the probablity is 2.7%. So they are very close.
pnorm(q=152.4, mean=fhgtmean, sd=fhgtsd) #0.028342
## [1] 0.028342
length(fdims$hgt[fdims$hgt < 152.4])/length(fdims$hgt) #0.02692308
## [1] 0.02692308
The theoretical probability (assuming normal distribution) that a female’s weight is between 150 lbs. and 160 lbs. is about 11.5%. However, in this particular data set, the number of females that fall within this range is about 7.7% only. This is a difference of about 3.8 percent. I’m not sure how significant this difference is.
x <- pnorm(q=68.0, mean=fwgtmean, sd=fwgtsd) #0.7792121
y <- pnorm(q=72.6, mean=fwgtmean, sd=fwgtsd) #0.8939697
theoretical <- y - x #0.1147576 (theoretical)
actual <- length(fdims$wgt[fdims$wgt < 72.6 & fdims$wgt > 68.0])/length(fdims$wgt) #0.07692308
theoretical - actual
## [1] 0.03783453
I cannot generate histogram or qqplots when I knit. I get some error (see demo below). However, I am able to run the functions just fine directly in R Studio environment. I tried to do some research on this issue, but I was unable to find a fix. As a workaround, I saved the graphics and attached them on this file. Demo of error seen: https://screencast-o-matic.com/watch/cFef36D62X
bii.di
) belongs to normal probability plot letter ____.Answer: Normal Q-Q Plot B
NOTE:
#I have to comment the codes that generate the histogram and qq plots because these functions
#generate an error. However, I am able to run the functions fine directly in R Studio.
#hist(fdims$bii.di)
#qqnorm(fdims$bii.di)
#qqline(fdims$bii.di)
include_graphics("./fdims_bii.di_histogram.png")
include_graphics("./fdims_bii.di_QQplot.png")
elb.di
) belongs to normal probability plot letter ____.Answer: Normal Q-Q Plot C
#generate an error. However, I am able to run the functions fine directly in R Studio.
#hist(fdims$elb.di)
#qqnorm(fdims$elb.di)
#qqline(fdims$elb.di)
include_graphics("./fdims_elb.di_histogram.png")
include_graphics("./fdims_elb.di_QQplot.png")
age
) belongs to normal probability plot letter ____.Answer: Normal Q-Q Plot D
#generate an error. However, I am able to run the functions fine directly in R Studio.
#hist(fdims$age)
#qqnorm(fdims$age)
#qqline(fdims$age)
include_graphics("./fdims_age_histogram.png")
include_graphics("./fdims_age_QQplot.png")
che.de
) belongs to normal probability plot letter ____.Answer: Normal Q-Q Plot A
#generate an error. However, I am able to run the functions fine directly in R Studio.
#hist(fdims$che.de)
#qqnorm(fdims$che.de)
#qqline(fdims$che.de)
include_graphics("./fdims_che.de_histogram.png")
include_graphics("./fdims_che.de_QQplot.png")
I searched for any hints of the reason behind why a q-q plot would have a stepwise pattern. I found this page below that talks about why this happens. It says that a stepwise pattern occurs when a variable is discrete. Plot D is the Q-Q plot for age, and this stepwise pattern is prominent in this plot.
kne.di
). Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.The variable
kne.di
is right skewed. This is confirmed by the histogram.
#generate an error. However, I am able to run the functions fine directly in R Studio.
#hist(fdims$kne.di)
#qqnorm(fdims$kne.di)
#qqline(fdims$kne.di)
include_graphics("./fdims_kne.di_histogram.png")
include_graphics("./fdims_kne.di_QQplot.png")