Section 9.7
1. Define variables containing the heights of males and females like this: How many measurements do we have for each? 238 measurements, female; 812 measurements, male.
library(dslabs)
data(heights)
male <- heights$height[heights$sex == "Male"]
female <- heights$height[heights$sex == "Female"]
2. Suppose we can’t make a plot and want to compare the distributions
side by side. We can’t just list all the numbers. Instead, we will look
at the percentiles. Create a five row table
showing female_percentiles and male_percentiles with
the 10th, 30th, 50th, 70th, & 90th percentiles for each sex. Then
create a data frame with these two as columns.
male_percentiles<-quantile(male, probs=c(.1, .3, .5, .7, .9))
female_percentiles<-quantile(female, probs=c(.1, .3, .5, .7, .9))
data.frame(male_percentiles, female_percentiles)
## male_percentiles female_percentiles
## 10% 65.00000 61.00000
## 30% 68.00000 63.00000
## 50% 69.00000 64.98031
## 70% 71.00000 66.46417
## 90% 73.22751 69.00000
3. Study the following boxplots showing population sizes by country: Which continent has the country with the biggest population size?
Asia
4. What continent has the largest median population size? Africa.
5. What is median population size for Africa to the nearest million? 11 million.
6. What proportion of countries in Europe have populations below 14 million? b. 75
7. If we use a log transformation, which continent shown above has the largest interquartile range? Americas
8. Load the height data set and create a vector x with
just the male heights. What proportion of the data is between 69 and 72
inches (taller than 69, but shorter or equal to 72)? Hint: use a logical
operator and mean.
library(dslabs)
data(heights)
x <- heights$height[heights$sex=="Male"]
meanx<-mean(x)
sdx<-sd(x)
pnorm(72, meanx, sdx)-pnorm(69, meanx, sdx)
## [1] 0.3061779