For this homework assignment, only two different packages were needed: dplyr and tidyverse.
## Load required packages ##
library(dplyr) ## Manipulating data
library(tidyverse) ## Cleaning data
library(ggplot2) ## Graphing data
Import the bomber_wide.rds file, which lists the flying hours for each aircraft by year. Convert this data to a tibble and tidy it by changing it from a wide format to a long format so that you have the following columns: Type, MD, Year, & FH.
## Problem 1 ##
bomber_wide <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/bomber_wide.rds"))
bomber_wide %>%
gather("FY", "FH", 3:21)
## # A tibble: 57 × 4
## Type MD FY FH
## <chr> <chr> <chr> <int>
## 1 Bomber B-1 1996 26914
## 2 Bomber B-2 1996 2364
## 3 Bomber B-52 1996 28511
## 4 Bomber B-1 1997 25219
## 5 Bomber B-2 1997 2776
## 6 Bomber B-52 1997 26034
## 7 Bomber B-1 1998 24205
## 8 Bomber B-2 1998 2166
## 9 Bomber B-52 1998 25639
## 10 Bomber B-1 1999 23306
## # ... with 47 more rows
Import the bomber_long.rds data, which provides the value for three different outputs for each aircraft by year. The output measures include cost, flying hours, and gallons of gas consumed but these variables are “stacked” in the Output variable. Change this data to a tibble and convert to a wider format so that you have the following columns: Type, MD, FY, Cost, FH, & Gallons.
## Problem 2 ##
bomber_long <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/bomber_long.rds"))
bomber_long %>%
spread(key = Output, value = Value)
## # A tibble: 57 × 6
## Type MD FY Cost FH Gallons
## * <chr> <chr> <int> <int> <int> <int>
## 1 Bomber B-1 1996 72753781 26914 88594449
## 2 Bomber B-1 1997 71297263 25219 85484074
## 3 Bomber B-1 1998 84026805 24205 85259038
## 4 Bomber B-1 1999 71848336 23306 79323816
## 5 Bomber B-1 2000 58439777 25013 86230284
## 6 Bomber B-1 2001 94946077 25059 86892432
## 7 Bomber B-1 2002 96458536 26581 89198262
## 8 Bomber B-1 2003 68650070 21491 74485788
## 9 Bomber B-1 2004 101895634 28118 101397707
## 10 Bomber B-1 2005 124816690 21859 78410415
## # ... with 47 more rows
Import the bomber_combined.rds file. Note that the first variable in this data (AC) combines the aircraft type (Bomber) and aircraft designator (i.e. B-1). This variable should be split into two. Take this data and convert it to a tibble and separate the AC variable into “Type” and “MD”.
## Problem 3 ##
bomber_combined <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/bomber_combined.rds"))
bomber_combined %>%
separate(AC, into = c("Type", "MD"), sep = " ")
## # A tibble: 57 × 6
## Type MD FY Cost FH Gallons
## * <chr> <chr> <int> <int> <int> <int>
## 1 Bomber B-1 1996 72753781 26914 88594449
## 2 Bomber B-1 1997 71297263 25219 85484074
## 3 Bomber B-1 1998 84026805 24205 85259038
## 4 Bomber B-1 1999 71848336 23306 79323816
## 5 Bomber B-1 2000 58439777 25013 86230284
## 6 Bomber B-1 2001 94946077 25059 86892432
## 7 Bomber B-1 2002 96458536 26581 89198262
## 8 Bomber B-1 2003 68650070 21491 74485788
## 9 Bomber B-1 2004 101895634 28118 101397707
## 10 Bomber B-1 2005 124816690 21859 78410415
## # ... with 47 more rows
Import the bomber_prefix.rds data. Take this data and convert it to a tibble and unite the prefix and number variables into an “MD” variable so that the data matches the tidy data sets you produced in problems #2 and #3.
bomber_prefix <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/bomber_prefix.rds"))
bomber_prefix %>%
unite(MD, prefix, number, sep = "-") %>%
spread(key = Output, value = Value)
## # A tibble: 57 × 6
## Type MD FY Cost FH Gallons
## * <chr> <chr> <int> <int> <int> <int>
## 1 Bomber B-1 1996 72753781 26914 88594449
## 2 Bomber B-1 1997 71297263 25219 85484074
## 3 Bomber B-1 1998 84026805 24205 85259038
## 4 Bomber B-1 1999 71848336 23306 79323816
## 5 Bomber B-1 2000 58439777 25013 86230284
## 6 Bomber B-1 2001 94946077 25059 86892432
## 7 Bomber B-1 2002 96458536 26581 89198262
## 8 Bomber B-1 2003 68650070 21491 74485788
## 9 Bomber B-1 2004 101895634 28118 101397707
## 10 Bomber B-1 2005 124816690 21859 78410415
## # ... with 47 more rows
Import the bomber_mess.rds file so that it is a tibble. Clean this data up by making it contain the following variables:
Perform the required actions by stringing together the necessary functions with the pipe operator (%>%).
## Problem 5 ##
bomber_mess <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/bomber_mess.rds"))
bomber_mess %>%
unite(MD, prefix, number, sep = "-") %>%
separate(Metric, into = c("FY", "Output"), sep = "_") %>%
spread(key = Output, value = Value)
## # A tibble: 57 × 6
## Type MD FY Cost FH Gallons
## * <chr> <chr> <chr> <int> <int> <int>
## 1 Bomber B-1 1996 72753781 26914 88594449
## 2 Bomber B-1 1997 71297263 25219 85484074
## 3 Bomber B-1 1998 84026805 24205 85259038
## 4 Bomber B-1 1999 71848336 23306 79323816
## 5 Bomber B-1 2000 58439777 25013 86230284
## 6 Bomber B-1 2001 94946077 25059 86892432
## 7 Bomber B-1 2002 96458536 26581 89198262
## 8 Bomber B-1 2003 68650070 21491 74485788
## 9 Bomber B-1 2004 101895634 28118 101397707
## 10 Bomber B-1 2005 124816690 21859 78410415
## # ... with 47 more rows
Once you’ve created the above tidy data, plot the historical trends of this data in ggplot2 with a line chart such that the plot is facetted by the Cost, FH, and Gallons variables and each facet compares the different MDs (“B-1”, “B-2”, “B-52”).
## Problem 5 ##
bomber_mess %>%
unite(MD, prefix, number, sep = "-") %>%
separate(Metric, into = c("FY", "Output"), sep = "_") %>%
ggplot (aes(x = FY, y = Value, color = MD, group = MD)) +
geom_point() + geom_line() +
facet_grid(Output~.,scales = "free")
Import the ws_programmatics.rds & ws_categorization.rds data so that they are tibbles and perform the following steps in sequence using the pipe operator (%>%).
## Problem 6 ##
ws_programmatics <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_programmatics.rds"))
ws_categorizations <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_categorizations.rds"))
ws_combined <- inner_join(ws_programmatics, ws_categorizations) %>%
filter(FY == 2014) %>%
filter(Base == "MINOT AFB (ND)") %>%
filter(System == "AIRCRAFT" | System == "MISSILES") %>%
group_by(System) %>%
summarise(Total_O.S. = sum(Total_O.S, na.rm = TRUE), Total_End_Strength = sum(End_Strength, na.rm = TRUE)) %>%
print()
## # A tibble: 2 × 3
## System Total_O.S. Total_End_Strength
## <chr> <dbl> <dbl>
## 1 AIRCRAFT 297398235 2023
## 2 MISSILES 112224909 1951
Once again, join the ws_programmatics.rds & ws_categorization.rds data; however, this time identify which Base had the largest cost per flying hour (defined as CPFH = Total_O.S. / FH which requires you to create a new variable) in 2014. Using a bar chart in ggplot2, plot these values for the top 10 bases with the largest cost per flying hour.
## Problem 7 ##
ws_programmatics <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_programmatics.rds"))
ws_categorizations <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_categorizations.rds"))
ws_combined <- inner_join(ws_programmatics, ws_categorizations) %>%
filter (FY == 2014) %>%
select (Base, Total_O.S, FH) %>%
na.omit() %>%
mutate (CPFH = Total_O.S / FH) %>%
arrange (desc(CPFH)) %>%
group_by (Base) %>%
top_n (n = 1) %>%
head (10) %>%
print()
## Source: local data frame [10 x 4]
## Groups: Base [10]
##
## Base Total_O.S FH CPFH
## <chr> <dbl> <dbl> <dbl>
## 1 WRIGHT-PATTERSON AFB (OH) 6275048 18 348613.78
## 2 TINKER AFB (OK) 1042151365 3783 275482.78
## 3 HICKAM AFB (HI) 340623396 3174 107316.76
## 4 BEALE AFB (CA) 111061067 1225 90662.10
## 5 LANGLEY AFB (VA) 659408989 8389 78604.00
## 6 OFFUTT AFB (NE) 90182794 1577 57186.30
## 7 EDWARDS AFB (CA) 13575421 241 56329.55
## 8 WHITEMAN AFB (MO) 315071356 5813 54201.16
## 9 MINOT AFB (ND) 287851847 6079 47351.84
## 10 BARKSDALE AFB (LA) 532348902 12037 44226.04
ggplot (ws_combined, aes (x = reorder(Base, -CPFH), y = CPFH)) +
geom_bar (stat = "identity")
Using scatter plots in ggplot2, assess the relationship between the end strength (End_Strength) variable and total costs (Total_O.S). Provide three scatter plots that visually assesses this replationship from different angles (by FY, System, etc).
## Scatterplot 1 ##
ws_programmatics <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_programmatics.rds"))
ws_categorizations <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_categorizations.rds"))
ws_combined <- left_join(ws_programmatics, ws_categorizations, by = c("Base", "MD")) %>%
select(Base, MD, FY, Total_O.S, End_Strength, FH, System) %>%
na.omit()
ggplot(data = ws_combined, aes(x = End_Strength, y = Total_O.S)) +
geom_point(shape = 19, size = 2) +
geom_smooth(method = lm)
## Scatterplot 2 ##
ws_programmatics <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_programmatics.rds"))
ws_categorizations <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_categorizations.rds"))
ws_combined <- left_join(ws_programmatics, ws_categorizations, by = c("Base", "MD")) %>%
select(Base, MD, FY, Total_O.S, End_Strength, FH, System) %>%
na.omit()
ggplot(data = ws_combined, aes(x = End_Strength, y = Total_O.S, color = FY)) +
geom_point(shape = 19, size = 2)
## Scatterplot 3 ##
ws_programmatics <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_programmatics.rds"))
ws_categorizations <- as_tibble(readRDS("/Users/MatthewMurphy/Dropbox (Personal)/School/5th Year/Data Wrangling with R/R Files/Data Wrangling with R (BANA 8090)/data/ws_categorizations.rds"))
ws_programmatics %>%
inner_join(ws_categorizations, by = c("Base","MD")) %>%
group_by(System) %>%
summarise(Total_End_Strength = sum(End_Strength,na.rm = TRUE),Total_OS = sum(Total_O.S,na.rm = TRUE)) %>%
ggplot(aes(x = Total_End_Strength,y = Total_OS, color = System)) +
geom_point(shape=19,size=2)