Assignment 5

Load With Clean Names

names that can be referred to when subsetting dataframes( no whitespaces between names or no invalid characters) and filling company column with appropriate values

untid <- read_csv("untidy.csv") |>
          janitor::clean_names() |> 
          rename(company = 1, ood = 2) |> 
          drop_na(ood) |> mutate(company = ifelse(is.na(company), lag(company), company))

## New names:
## Rows: 5 Columns: 7
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): ...1, ...2 dbl (3): Los Angeles, San Diego, San Francisco num (2):
## Phoenix, Seattle
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`

kable(untid)

company	ood	los_angeles	phoenix	san_diego	san_francisco	seattle
ALASKA	on time	497	221	212	503	1841
ALASKA	delayed	62	12	20	102	305
AM WEST	on time	694	4840	383	320	201
AM WEST	delayed	117	415	65	129	61

Lengthen Data

States are listed as columns holding the amount of delays and on time flights they had however this not a appropriate way of handling analysis in vectorized programming. We would to have them as observations instead of variables.

untid <- untid |>
          pivot_longer(cols = los_angeles:seattle, names_to = "territory", values_to = "arrival")
kable(untid)

company	ood	territory	arrival
ALASKA	on time	los_angeles	497
ALASKA	on time	phoenix	221
ALASKA	on time	san_diego	212
ALASKA	on time	san_francisco	503
ALASKA	on time	seattle	1841
ALASKA	delayed	los_angeles	62
ALASKA	delayed	phoenix	12
ALASKA	delayed	san_diego	20
ALASKA	delayed	san_francisco	102
ALASKA	delayed	seattle	305
AM WEST	on time	los_angeles	694
AM WEST	on time	phoenix	4840
AM WEST	on time	san_diego	383
AM WEST	on time	san_francisco	320
AM WEST	on time	seattle	201
AM WEST	delayed	los_angeles	117
AM WEST	delayed	phoenix	415
AM WEST	delayed	san_diego	65
AM WEST	delayed	san_francisco	129
AM WEST	delayed	seattle	61

Widen Data

On time and delayed are two observations that are common in almost every row, and are directly linked to how often it occured so we would like to make them variables and making their number occurences an observation.

untid <- untid |> 
         pivot_wider( names_from = ood, values_from = arrival) |>
         mutate(territory  = str_replace(territory, "_", " ")) |> 
         mutate(territory = str_to_title(territory))  
kable(untid)

company	territory	on time	delayed
ALASKA	Los Angeles	497	62
ALASKA	Phoenix	221	12
ALASKA	San Diego	212	20
ALASKA	San Francisco	503	102
ALASKA	Seattle	1841	305
AM WEST	Los Angeles	694	117
AM WEST	Phoenix	4840	415
AM WEST	San Diego	383	65
AM WEST	San Francisco	320	129
AM WEST	Seattle	201	61

Assignment 5

Jonathan Cruz

2024-02-24

Load With Clean Names

Lengthen Data

Widen Data