Section 9.7

1. Define variables containing the heights of males and females like this: How many measurements do we have for each? 238 measurements, female; 812 measurements, male.

library(dslabs)
data(heights)
male <- heights$height[heights$sex == "Male"]
female <- heights$height[heights$sex == "Female"]

2. Suppose we can’t make a plot and want to compare the distributions side by side. We can’t just list all the numbers. Instead, we will look at the percentiles. Create a five row table showing female_percentiles and male_percentiles with the 10th, 30th, 50th, 70th, & 90th percentiles for each sex. Then create a data frame with these two as columns.

male_percentiles<-quantile(male, probs=c(.1, .3, .5, .7, .9))
female_percentiles<-quantile(female, probs=c(.1, .3, .5, .7, .9))
data.frame(male_percentiles, female_percentiles)
##     male_percentiles female_percentiles
## 10%         65.00000           61.00000
## 30%         68.00000           63.00000
## 50%         69.00000           64.98031
## 70%         71.00000           66.46417
## 90%         73.22751           69.00000

3. Study the following boxplots showing population sizes by country: Which continent has the country with the biggest population size?

Asia

4. What continent has the largest median population size? Africa.

5. What is median population size for Africa to the nearest million? 11 million.

6. What proportion of countries in Europe have populations below 14 million? b. 75

7. If we use a log transformation, which continent shown above has the largest interquartile range? Americas

8. Load the height data set and create a vector x with just the male heights. What proportion of the data is between 69 and 72 inches (taller than 69, but shorter or equal to 72)? Hint: use a logical operator and mean.

library(dslabs)
data(heights)
x <- heights$height[heights$sex=="Male"]
meanx<-mean(x)
sdx<-sd(x)
pnorm(72, meanx, sdx)-pnorm(69, meanx, sdx)
## [1] 0.3061779