As mentioned previously, each time you open a new RStudio session, you need to run the following three commands.
require(mosaic)
require(openintro)
require(MASS)
We’re going to look at the males data set, a sample of 250 adult men’s recorded height in centimeters. This is a subset of the data included with the NHANES package.
First, we read in the dataset.
males<-read.csv("http://www.math.usu.edu/cfairbourn/Stat2300/RStudioFiles/data/males.csv")
We’ll use the hist()
function to construct a frequency histogram of the heights
hist(males$height,
main = "Heights of Adult Men", #Include a chart title
xlab = "Height in centimeters", #Change the label for the x-axis
col = "skyblue", #Change the fill color of the histogram
breaks = seq(140, 210, 2.5),
xaxt = 'n') #This leaves the x-axis blank so we can specify the tick marks ourselves.
axis(side = 1, at = seq(145, 210, 2.5)) #This sets the tick marks every 2.5, starting at 140 and ending at 210.
abline(v = 174, col = "red", lwd = 3) #Add in a thick vertical line at x=174
Use the tally()
function to count the men that have a height less than or equal to 170 cm.
tally(~height <= 174, data = males)
## height <= 174
## TRUE FALSE
## 98 152
You can use this information to calculate the proportion of men in the data set that have a height less than or equal to 170 cm.
Construct a density histogram of the heights.
hist(males$height,
prob = TRUE, #Specify that you want a density histogram
main = "Heights of Adult Men", #Include a chart title
xlab = "Height in centimeters", #Change the label for the x-axis
col = "skyblue", #Change the fill color of the histogram
breaks = seq(140, 210, 2.5),
xaxt = 'n') #This leaves the x-axis blank so we can specify the tick marks ourselves.
axis(side = 1, at = seq(145, 210, 2.5)) #This sets the tick marks every 2.5, starting at 140 and ending at 210.
abline(v = 174, col = "red", lwd = 3) #Add in a thick vertical line at x=174
#Add in the normal curve overlay
points(seq(140, 210, length.out=500),
dnorm(seq(140, 210, length.out=500),
mean(males$height), sd(males$height)), type="l", col="darkblue", lwd=2)
Calculate the area under the normal curve less than 174.
pnorm(174, mean=mean(males$height), sd=sd(males$height))
## [1] 0.3852858