221527130

第一题编写代码

利用nycflights13包的flights数据集是2013年从纽约三大机场（JFK、LGA、EWR）起飞的所有航班的准点数据，共336776条记录。

计算纽约三大机场2013起飞航班数和平均延误时间（可使用group_by, summarise函数）

flights

## # A tibble: 336,776 × 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      517            515         2      830            819
##  2  2013     1     1      533            529         4      850            830
##  3  2013     1     1      542            540         2      923            850
##  4  2013     1     1      544            545        -1     1004           1022
##  5  2013     1     1      554            600        -6      812            837
##  6  2013     1     1      554            558        -4      740            728
##  7  2013     1     1      555            600        -5      913            854
##  8  2013     1     1      557            600        -3      709            723
##  9  2013     1     1      557            600        -3      838            846
## 10  2013     1     1      558            600        -2      753            745
## # ℹ 336,766 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>

flights %>%
group_by(origin) %>%
summarise(n=n(),depm=mean(dep_delay,na.rm=T))

## # A tibble: 3 × 3
##   origin      n  depm
##   <chr>   <int> <dbl>
## 1 EWR    120835  15.1
## 2 JFK    111279  12.1
## 3 LGA    104662  10.3

计算不同航空公司2013从纽约起飞航班数和平均延误时间

flights %>%
group_by(carrier) %>%
summarise(n=n(),depm=mean(dep_delay,na.rm=T)) %>%
arrange(desc(n))

## # A tibble: 16 × 3
##    carrier     n  depm
##    <chr>   <int> <dbl>
##  1 UA      58665 12.1 
##  2 B6      54635 13.0 
##  3 EV      54173 20.0 
##  4 DL      48110  9.26
##  5 AA      32729  8.59
##  6 MQ      26397 10.6 
##  7 US      20536  3.78
##  8 9E      18460 16.7 
##  9 WN      12275 17.7 
## 10 VX       5162 12.9 
## 11 FL       3260 18.7 
## 12 AS        714  5.80
## 13 F9        685 20.2 
## 14 YV        601 19.0 
## 15 HA        342  4.90
## 16 OO         32 12.6

计算纽约三大机场排名前三个目的地和平均飞行距离(可使用group_by, summarise, arrange, slice_max函数)

flights %>%
group_by(origin,dest) %>%
summarise(n=n(),distm=mean(distance)) %>%
slice_max(n,n=3)

## `summarise()` has grouped output by 'origin'. You can override using the
## `.groups` argument.

## # A tibble: 9 × 4
## # Groups:   origin [3]
##   origin dest      n distm
##   <chr>  <chr> <int> <dbl>
## 1 EWR    ORD    6100   719
## 2 EWR    BOS    5327   200
## 3 EWR    SFO    5127  2565
## 4 JFK    LAX   11262  2475
## 5 JFK    SFO    8204  2586
## 6 JFK    BOS    5898   187
## 7 LGA    ATL   10263   762
## 8 LGA    ORD    8857   733
## 9 LGA    CLT    6168   544

第二题解释代码

代码含义：用管道操作符 %>% 将数据框 iris 转换为 tibble 格式，arrange 的第一个参数是数据框（在这里是 tibble(iris)），后面的参数指定了排序的列和顺序，已品种为首，其次是sepal，降序排列

tibble(iris) %>% 
  arrange(Species,across(starts_with("Sepal"), desc))

## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.8         4            1.2         0.2 setosa 
##  2          5.7         4.4          1.5         0.4 setosa 
##  3          5.7         3.8          1.7         0.3 setosa 
##  4          5.5         4.2          1.4         0.2 setosa 
##  5          5.5         3.5          1.3         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          5.4         3.9          1.3         0.4 setosa 
##  8          5.4         3.7          1.5         0.2 setosa 
##  9          5.4         3.4          1.7         0.2 setosa 
## 10          5.4         3.4          1.5         0.4 setosa 
## # ℹ 140 more rows

代码含义：将 starwars 数据集按照 gender 列进行分组。对每个性别组，计算该组中 mass 列的平均值（忽略缺失值）。筛选出每个组中 mass 大于组内平均值的角色

starwars %>% 
  group_by(gender) %>% 
  filter(mass > mean(mass, na.rm = TRUE))

## # A tibble: 15 × 14
## # Groups:   gender [3]
##    name    height   mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>    <int>  <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Darth …    202  136   none       white      yellow          41.9 male  mascu…
##  2 Owen L…    178  120   brown, gr… light      blue            52   male  mascu…
##  3 Beru W…    165   75   brown      light      blue            47   fema… femin…
##  4 Chewba…    228  112   brown      unknown    blue           200   male  mascu…
##  5 Jabba …    175 1358   <NA>       green-tan… orange         600   herm… mascu…
##  6 Jek To…    180  110   brown      fair       blue            NA   <NA>  <NA>  
##  7 IG-88      200  140   none       metal      red             15   none  mascu…
##  8 Bossk      190  113   none       green      red             53   male  mascu…
##  9 Ayla S…    178   55   none       blue       hazel           48   fema… femin…
## 10 Gregar…    185   85   black      dark       brown           NA   <NA>  <NA>  
## 11 Lumina…    170   56.2 black      yellow     blue            58   fema… femin…
## 12 Zam We…    168   55   blonde     fair, gre… yellow          NA   fema… femin…
## 13 Shaak …    178   57   none       red, blue… black           NA   fema… femin…
## 14 Grievo…    216  159   none       brown, wh… green, y…       NA   male  mascu…
## 15 Tarfful    234  136   brown      brown      blue            NA   male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

代码含义：从 starwars 数据集中选择 name、homeworld 和 species 三列。将 homeworld 和 species 列转换为因子类型。

starwars %>%
  select(name, homeworld, species) %>%
  mutate(across(!name, as.factor))

## # A tibble: 87 × 3
##    name               homeworld species
##    <chr>              <fct>     <fct>  
##  1 Luke Skywalker     Tatooine  Human  
##  2 C-3PO              Tatooine  Droid  
##  3 R2-D2              Naboo     Droid  
##  4 Darth Vader        Tatooine  Human  
##  5 Leia Organa        Alderaan  Human  
##  6 Owen Lars          Tatooine  Human  
##  7 Beru Whitesun Lars Tatooine  Human  
##  8 R5-D4              Tatooine  Droid  
##  9 Biggs Darklighter  Tatooine  Human  
## 10 Obi-Wan Kenobi     Stewjon   Human  
## # ℹ 77 more rows

代码含义：将 mtcars 数据集转换为 tibble 格式。按照 vs 列（发动机类型）进行分组。在每个 vs 组内，将 hp 列（马力）分成3个区间，并创建一个新列 hp_cut 来存储分箱结果。最后，按照 hp_cut 列（马力区间）重新分组

tibble(mtcars) %>%
  group_by(vs) %>%
  mutate(hp_cut = cut(hp, 3)) %>%
  group_by(hp_cut)

## # A tibble: 32 × 12
## # Groups:   hp_cut [6]
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb hp_cut     
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>      
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 (90.8,172] 
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 (90.8,172] 
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 (75.7,99.3]
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 (99.3,123] 
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 (172,254]  
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 (99.3,123] 
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 (172,254]  
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 (51.9,75.7]
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 (75.7,99.3]
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 (99.3,123] 
## # ℹ 22 more rows

第三题查找帮助理解函数

阅读 https://dplyr.tidyverse.org/reference/mutate-joins.html 内容，说明4个数据集链接函数函数的作用。分别举一个实际例子演示并解释其输出结果。

inner_join() ：

library(dplyr)

students <- tibble(
  id = 1:3,
  name = c("Alice", "Bob", "Charlie")
)

scores <- tibble(
  id = c(2, 3, 4),
  score = c(90, 85, 70)
)

inner_join(students, scores, by = "id")

## # A tibble: 2 × 3
##      id name    score
##   <dbl> <chr>   <dbl>
## 1     2 Bob        90
## 2     3 Charlie    85

left_join() ：

library(dplyr)

students <- tibble(
  id = c(1, 2, 3),
  name = c("Alice", "Bob", "Charlie")
)

scores <- tibble(
  id = c(1, 2, 4),
  score = c(90, 85, 88)
)

result <- left_join(students, scores, by = "id")
print(result)

## # A tibble: 3 × 3
##      id name    score
##   <dbl> <chr>   <dbl>
## 1     1 Alice      90
## 2     2 Bob        85
## 3     3 Charlie    NA

right_join() ：

result <- right_join(students, scores, by = "id")
print(result)

## # A tibble: 3 × 3
##      id name  score
##   <dbl> <chr> <dbl>
## 1     1 Alice    90
## 2     2 Bob      85
## 3     4 <NA>     88

full_join() ：

result <- full_join(students, scores, by = "id")
print(result)

## # A tibble: 4 × 3
##      id name    score
##   <dbl> <chr>   <dbl>
## 1     1 Alice      90
## 2     2 Bob        85
## 3     3 Charlie    NA
## 4     4 <NA>       88


## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


```r
summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

221527130

Guobao

2025-04-02

第一题编写代码

第二题解释代码

第三题查找帮助理解函数

Including Plots

221527130

Guobao

2025-04-02

第一题 编写代码

第二题 解释代码

第三题 查找帮助理解函数

Including Plots

第一题编写代码

第二题解释代码

第三题查找帮助理解函数