library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
library(tidyr)
library(nycflights23)
library(fivethirtyeight)
## Some larger datasets need to be installed separately, like senators and
## house_district_forecast. To install these, we recommend you install the
## fivethirtyeightdata package by running:
## install.packages('fivethirtyeightdata', repos =
## 'https://fivethirtyeightdata.github.io/drat/', type = 'source')
4.1 Tidy data frames have a column for every variable in the data set, a row for every observation, and a table for each observational unit.
4.2 Tidy data frames are useful for organizing data because they ensure that every variable has its own column and makes it easier to plot or visualize data based on selected variables. It also makes it easier to sort data by specific variables as they have their own columns to sort by.
4.3
airline_safety_smaller <- airline_safety |>
select(airline, starts_with("fatalities"))
airline_safety_smaller |>
pivot_longer(names_to = "fatalities_years",
values_to = "count",
cols = -airline)
## # A tibble: 112 × 3
## airline fatalities_years count
## <chr> <chr> <int>
## 1 Aer Lingus fatalities_85_99 0
## 2 Aer Lingus fatalities_00_14 0
## 3 Aeroflot fatalities_85_99 128
## 4 Aeroflot fatalities_00_14 88
## 5 Aerolineas Argentinas fatalities_85_99 0
## 6 Aerolineas Argentinas fatalities_00_14 0
## 7 Aeromexico fatalities_85_99 64
## 8 Aeromexico fatalities_00_14 0
## 9 Air Canada fatalities_85_99 0
## 10 Air Canada fatalities_00_14 0
## # ℹ 102 more rows