Exercise 1. Vector lengths

When analyzing data it’s often important to know the number of measurements you have for each category.

Define a variable male that contains the male heights. Define a variable female that contains the female heights. Report the length of each variable.

library(dslabs)
data(heights)
male <- heights$height[heights$sex=="Male"]
female <- heights$height[heights$sex=="Female"]
length(male)
## [1] 812
length(female)
## [1] 238

Exercise 2. Percentiles

Suppose we can’t make a plot and want to compare the distributions side by side. If the number of data points is large, listing all the numbers is inpractical. A more practical approach is to look at the percentiles. We can obtain percentiles using the quantile function like this

library(dslabs)
data(heights)
quantile(heights$height, seq(.01, 0.99, 0.01))
##       1%       2%       3%       4%       5%       6%       7%       8% 
## 59.00000 60.00000 60.00000 61.00000 62.00000 62.00000 62.28869 63.00000 
##       9%      10%      11%      12%      13%      14%      15%      16% 
## 63.00000 63.00000 63.77953 64.00000 64.00000 64.00000 64.17321 64.96000 
##      17%      18%      19%      20%      21%      22%      23%      24% 
## 65.00000 65.00000 65.00000 65.00000 65.00000 65.94457 66.00000 66.00000 
##      25%      26%      27%      28%      29%      30%      31%      32% 
## 66.00000 66.00000 66.00000 66.00000 66.50000 66.92000 66.92913 67.00000 
##      33%      34%      35%      36%      37%      38%      39%      40% 
## 67.00000 67.00000 67.00000 67.00000 67.00000 67.00000 67.32200 67.71650 
##      41%      42%      43%      44%      45%      46%      47%      48% 
## 67.72540 68.00000 68.00000 68.00000 68.00000 68.00000 68.00000 68.00000 
##      49%      50%      51%      52%      53%      54%      55%      56% 
## 68.11024 68.50000 68.89000 68.89764 69.00000 69.00000 69.00000 69.00000 
##      57%      58%      59%      60%      61%      62%      63%      64% 
## 69.00000 69.00000 69.00000 69.00000 69.60000 70.00000 70.00000 70.00000 
##      65%      66%      67%      68%      69%      70%      71%      72% 
## 70.00000 70.00000 70.00000 70.00000 70.00000 70.07874 70.73700 70.86614 
##      73%      74%      75%      76%      77%      78%      79%      80% 
## 71.00000 71.00000 71.00000 71.00000 71.00000 71.11000 72.00000 72.00000 
##      81%      82%      83%      84%      85%      86%      87%      88% 
## 72.00000 72.00000 72.00000 72.00000 72.00000 72.00000 72.00000 72.44011 
##      89%      90%      91%      92%      93%      94%      95%      96% 
## 72.83465 73.00000 73.00000 74.00000 74.00000 74.00000 74.80315 75.00000 
##      97%      98%      99% 
## 75.00000 76.00000 78.00000

Create two five row vectors showing the 10th, 30th, 50th, 70th, and 90th percentiles for the heights of each sex called these vectors female_percentiles and male_percentiles.

Then create a data frame called df with these two vectors as columns. The column names should be female and male and should appear in that order. As an example consider that if you want a data frame to have column names names and grades, in that order, you do it like this:

df <- data.frame(names = c("Jose", "Mary"), grades = c("B", "A"))

Take a look at the df by printing it. This will provide some information on how male and female heights differ.

library(dslabs)
data(heights)
male <- heights$height[heights$sex=="Male"]
female <- heights$height[heights$sex=="Female"]
male_percentiles <- quantile(male, seq(.1,.9,.2))
female_percentiles <- quantile(female, seq(.1,.9,.2))
df <- data.frame(female = female_percentiles, male = male_percentiles)