第一题 编写代码

利用nycflights13包的flights数据集是2013年从纽约三大机场(JFK、LGA、EWR)起飞的所有航班的准点数据,共336776条记录。

第二题 解释代码

  1. 代码含义:用管道操作符 %>% 将数据框 iris 转换为 tibble 格式,arrange 的第一个参数是数据框(在这里是 tibble(iris)),后面的参数指定了排序的列和顺序,已品种为首,其次是sepal,降序排列

    tibble(iris) %>% 
      arrange(Species,across(starts_with("Sepal"), desc))
    ## # A tibble: 150 × 5
    ##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    ##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
    ##  1          5.8         4            1.2         0.2 setosa 
    ##  2          5.7         4.4          1.5         0.4 setosa 
    ##  3          5.7         3.8          1.7         0.3 setosa 
    ##  4          5.5         4.2          1.4         0.2 setosa 
    ##  5          5.5         3.5          1.3         0.2 setosa 
    ##  6          5.4         3.9          1.7         0.4 setosa 
    ##  7          5.4         3.9          1.3         0.4 setosa 
    ##  8          5.4         3.7          1.5         0.2 setosa 
    ##  9          5.4         3.4          1.7         0.2 setosa 
    ## 10          5.4         3.4          1.5         0.4 setosa 
    ## # ℹ 140 more rows
  2. 代码含义:将 starwars 数据集按照 gender 列进行分组。对每个性别组,计算该组中 mass 列的平均值(忽略缺失值)。筛选出每个组中 mass 大于组内平均值的角色

    starwars %>% 
      group_by(gender) %>% 
      filter(mass > mean(mass, na.rm = TRUE))
    ## # A tibble: 15 × 14
    ## # Groups:   gender [3]
    ##    name    height   mass hair_color skin_color eye_color birth_year sex   gender
    ##    <chr>    <int>  <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
    ##  1 Darth …    202  136   none       white      yellow          41.9 male  mascu…
    ##  2 Owen L…    178  120   brown, gr… light      blue            52   male  mascu…
    ##  3 Beru W…    165   75   brown      light      blue            47   fema… femin…
    ##  4 Chewba…    228  112   brown      unknown    blue           200   male  mascu…
    ##  5 Jabba …    175 1358   <NA>       green-tan… orange         600   herm… mascu…
    ##  6 Jek To…    180  110   brown      fair       blue            NA   <NA>  <NA>  
    ##  7 IG-88      200  140   none       metal      red             15   none  mascu…
    ##  8 Bossk      190  113   none       green      red             53   male  mascu…
    ##  9 Ayla S…    178   55   none       blue       hazel           48   fema… femin…
    ## 10 Gregar…    185   85   black      dark       brown           NA   <NA>  <NA>  
    ## 11 Lumina…    170   56.2 black      yellow     blue            58   fema… femin…
    ## 12 Zam We…    168   55   blonde     fair, gre… yellow          NA   fema… femin…
    ## 13 Shaak …    178   57   none       red, blue… black           NA   fema… femin…
    ## 14 Grievo…    216  159   none       brown, wh… green, y…       NA   male  mascu…
    ## 15 Tarfful    234  136   brown      brown      blue            NA   male  mascu…
    ## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
    ## #   vehicles <list>, starships <list>

    代码含义:从 starwars 数据集中选择 namehomeworldspecies 三列。将 homeworldspecies 列转换为因子类型。

    starwars %>%
      select(name, homeworld, species) %>%
      mutate(across(!name, as.factor))
    ## # A tibble: 87 × 3
    ##    name               homeworld species
    ##    <chr>              <fct>     <fct>  
    ##  1 Luke Skywalker     Tatooine  Human  
    ##  2 C-3PO              Tatooine  Droid  
    ##  3 R2-D2              Naboo     Droid  
    ##  4 Darth Vader        Tatooine  Human  
    ##  5 Leia Organa        Alderaan  Human  
    ##  6 Owen Lars          Tatooine  Human  
    ##  7 Beru Whitesun Lars Tatooine  Human  
    ##  8 R5-D4              Tatooine  Droid  
    ##  9 Biggs Darklighter  Tatooine  Human  
    ## 10 Obi-Wan Kenobi     Stewjon   Human  
    ## # ℹ 77 more rows
  3. 代码含义:将 mtcars 数据集转换为 tibble 格式。按照 vs 列(发动机类型)进行分组。在每个 vs 组内,将 hp 列(马力)分成3个区间,并创建一个新列 hp_cut 来存储分箱结果。最后,按照 hp_cut 列(马力区间)重新分组

    tibble(mtcars) %>%
      group_by(vs) %>%
      mutate(hp_cut = cut(hp, 3)) %>%
      group_by(hp_cut)
    ## # A tibble: 32 × 12
    ## # Groups:   hp_cut [6]
    ##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb hp_cut     
    ##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>      
    ##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 (90.8,172] 
    ##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 (90.8,172] 
    ##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 (75.7,99.3]
    ##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 (99.3,123] 
    ##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 (172,254]  
    ##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 (99.3,123] 
    ##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 (172,254]  
    ##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 (51.9,75.7]
    ##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 (75.7,99.3]
    ## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 (99.3,123] 
    ## # ℹ 22 more rows

第三题 查找帮助理解函数

阅读 https://dplyr.tidyverse.org/reference/mutate-joins.html 内容,说明4个数据集链接函数函数的作用。分别举一个实际例子演示并解释其输出结果。

  1. inner_join()

    library(dplyr)
    
    students <- tibble(
      id = 1:3,
      name = c("Alice", "Bob", "Charlie")
    )
    
    scores <- tibble(
      id = c(2, 3, 4),
      score = c(90, 85, 70)
    )
    
    inner_join(students, scores, by = "id")
    ## # A tibble: 2 × 3
    ##      id name    score
    ##   <dbl> <chr>   <dbl>
    ## 1     2 Bob        90
    ## 2     3 Charlie    85
  2. left_join()

    library(dplyr)
    
    students <- tibble(
      id = c(1, 2, 3),
      name = c("Alice", "Bob", "Charlie")
    )
    
    scores <- tibble(
      id = c(1, 2, 4),
      score = c(90, 85, 88)
    )
    
    result <- left_join(students, scores, by = "id")
    print(result)
    ## # A tibble: 3 × 3
    ##      id name    score
    ##   <dbl> <chr>   <dbl>
    ## 1     1 Alice      90
    ## 2     2 Bob        85
    ## 3     3 Charlie    NA
  3. right_join()

    result <- right_join(students, scores, by = "id")
    print(result)
    ## # A tibble: 3 × 3
    ##      id name  score
    ##   <dbl> <chr> <dbl>
    ## 1     1 Alice    90
    ## 2     2 Bob      85
    ## 3     4 <NA>     88
  4. full_join()

    result <- full_join(students, scores, by = "id")
    print(result)
    ## # A tibble: 4 × 3
    ##      id name    score
    ##   <dbl> <chr>   <dbl>
    ## 1     1 Alice      90
    ## 2     2 Bob        85
    ## 3     3 Charlie    NA
    ## 4     4 <NA>       88

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


```r
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.