GETTING STARTED
download.file(“http://www.openintro.org/stat/data/bdims.RData”, destfile = “bdims.RData”) load(“bdims.RData”)
Create a data frame mdims which is the dimensions of males. Create a data frame fdims which is the dimensions of females.
EXERCISE 1.
Make a histogram of males’ heights and a histogram of feamles’ heights.
How would you describe and compare the two distributions? The women’s histogram is more sporadic versus the men’s histogram, however, the women’s and men’s distributions are both unimodal as well as bell shaped.
Find the mean of females’ heights and assign it to the variable fhgtmean and display it. Find the standard deviation of females’ heights and assign it to the variable fhgtsd and display it. Also, find the range of the heights.
Plot a density histogram of the females’ heights.
Overlay a normally distributed curve with the same mean and standard deviation as our data on the probability density function of our data.
Based on the graph with the overlay, would you say that the data is generally a normal distribution? Briefly explain your answer in at least one sentence. Based on the graph with the overlay, I would say that the data is NOT generally a normal distribution. The histogram surpasses the shape of the blue line at multiple points, which suggests the height of females are not normally distributed.
EXERCISE 2.
Use a Q-Q plot to evaluate the heights of female to help assess whether the heights are normally distributed.
Based on the plot, would you say that the female heights are normally distributed? Do all of the points follow the line, or do some points deviate from the line? What points are those? Based on the plot, the female heights are NOT normally distributed. The points are not perfectly placed on on the lines and you don’t see the points deviate. Some points are upwards/top right and some points are downward/lower left. The points would be the upper and lower tail.
EXERCISE 3.
Create a simulated data set, sim_norm, which comes from a normal distribution and has the same length, mean, and standard deviation of the female heights.
Make a density histogram of the data set sim_norm.
Make a normal probability plot of sim_norm.
Do all of the points fall on the line? How does this plot compare to the probability plot for the real data? No, all points do NOT fall on the line, the line is random, particularly the tails. This plot in comparison to the probability plot for real data, will have similarities, the differences don’t mean they aren’t normal distributions.
EXERCISE 4.
Plot a density histogram for female weights with an overlay of a normal probability density function.
Construct a normal probability plot for female weights.
Would you say that generally female weights are normally distributed based on the graphs? Briefly explain. Based on the graphs, no I would not say the female weights are normally distributed. The distribution is skewed, to the right, and the curve away from the line equals heavy weight.
EXERCISE 5.
Write out a probability question that you would like to answer regarding female heights. Calculate the probability using 1) the theoretical normal distribution and 2) the empirical method.
Write out a probability question that you would like to answer regarding female weights. Calculate the probability using 1) the theoretical normal distribution and 2) the empirical method.
Which variable, height or weight, had a closer agreement between the two methods? Of the two methods, height has a closer agreement.