Read in the data

Read csv and transform to have shape nx4. The dataset can be found here

df <- read.csv('https://raw.githubusercontent.com/ksooklall/CUNY-SPS-Masters-DS/main/DATA_607/homework/homework5/flightdata.csv', sep=',')
colnames(df) <- c('airline', 'status', 'los_angeles', 'phoenix', 'san_diego', 'san_francisco', 'seattle')
df <- df %>% mutate_all(list(~na_if(.,''))) %>% fill(airline)
df <- df %>% pivot_longer(!c('airline', 'status'), names_to='location', values_to='count')

Which airline is more dependable, ie which one was on time more?

df %>% group_by(airline, status) %>% summarise(total=sum(count), .groups='drop') %>% ggplot(aes(x=airline, y=total, color=status)) + geom_point(aes(size=total))

It looks like AM WEST has more on time flights.

How do the two airlines compare?

df[,c('airline', 'count')] %>% ggplot(aes(x=airline, y=count)) + geom_boxplot()

Both airlines share a close median however AM WEST has a large outliar while ALASKA has a larger IQR range