Overview

Air rage has taken off during the pandemic and shows no signs of slowing down any time soon. Even prior, individuals had differing views toward activities such as seat reclining, control over the window shade, and bringing a baby onboard. While attitudes toward flying differ, it begs the question of whether specific subsets of individuals have particular preferences. This analysis will be exploratory, with basic correlations. For our purposes, we will be focusing on attitudes toward bringing children on flights.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(ggthemes)
url.data <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/flying-etiquette-survey/flying-etiquette.csv"

dat <- read.csv(url(url.data))
nrow(dat)
## [1] 1040
head(dat)

Data Cleaning

Here, we subset the data per our population of interest (people who have opinions about babies on planes)

dat = janitor::clean_names(dat)
dat = subset(dat, in_general_is_itrude_to_bring_a_baby_on_a_plane != "")

Now, we'd like to examine our population size:

nrow(dat)
## [1] 849

Here are the potential predictors:

names(dat)
##  [1] "respondent_id"                                                                                                                         
##  [2] "how_often_do_you_travel_by_plane"                                                                                                      
##  [3] "do_you_ever_recline_your_seat_when_you_fly"                                                                                            
##  [4] "how_tall_are_you"                                                                                                                      
##  [5] "do_you_have_any_children_under_18"                                                                                                     
##  [6] "in_a_row_of_three_seats_who_should_get_to_use_the_two_arm_rests"                                                                       
##  [7] "in_a_row_of_two_seats_who_should_get_to_use_the_middle_arm_rest"                                                                       
##  [8] "who_should_have_control_over_the_window_shade"                                                                                         
##  [9] "is_itrude_to_move_to_an_unsold_seat_on_a_plane"                                                                                        
## [10] "generally_speaking_is_it_rude_to_say_more_than_a_few_words_tothe_stranger_sitting_next_to_you_on_a_plane"                              
## [11] "on_a_6_hour_flight_from_nyc_to_la_how_many_times_is_it_acceptable_to_get_up_if_you_re_not_in_an_aisle_seat"                            
## [12] "under_normal_circumstances_does_a_person_who_reclines_their_seat_during_a_flight_have_any_obligation_to_the_person_sitting_behind_them"
## [13] "is_itrude_to_recline_your_seat_on_a_plane"                                                                                             
## [14] "given_the_opportunity_would_you_eliminate_the_possibility_of_reclining_seats_on_planes_entirely"                                       
## [15] "is_it_rude_to_ask_someone_to_switch_seats_with_you_in_order_to_be_closer_to_friends"                                                   
## [16] "is_itrude_to_ask_someone_to_switch_seats_with_you_in_order_to_be_closer_to_family"                                                     
## [17] "is_it_rude_to_wake_a_passenger_up_if_you_are_trying_to_go_to_the_bathroom"                                                             
## [18] "is_itrude_to_wake_a_passenger_up_if_you_are_trying_to_walk_around"                                                                     
## [19] "in_general_is_itrude_to_bring_a_baby_on_a_plane"                                                                                       
## [20] "in_general_is_it_rude_to_knowingly_bring_unruly_children_on_a_plane"                                                                   
## [21] "have_you_ever_used_personal_electronics_during_take_off_or_landing_in_violation_of_a_flight_attendant_s_direction"                     
## [22] "have_you_ever_smoked_a_cigarette_in_an_airplane_bathroom_when_it_was_against_the_rules"                                                
## [23] "gender"                                                                                                                                
## [24] "age"                                                                                                                                   
## [25] "household_income"                                                                                                                      
## [26] "education"                                                                                                                             
## [27] "location_census_region"

Based on this, we'll subset for demographic factors and questions about kids. We'll also preserve the respondent ID so we'll have a primary key.

vars.identifiers = c("respondent_id", "gender", "age", "household_income", "education", "location_census_region")
vars.questions = c("do_you_have_any_children_under_18", "in_general_is_itrude_to_bring_a_baby_on_a_plane", "in_general_is_it_rude_to_knowingly_bring_unruly_children_on_a_plane")

dat = subset(dat, select = c(vars.identifiers,vars.questions ))

Some of these columns have long names, so we'll shorten them. We'll also shorten the response lengths:

dat = dat %>% rename(children.under.18 = do_you_have_any_children_under_18,
 rude.baby = in_general_is_itrude_to_bring_a_baby_on_a_plane, rude.child.unruly = in_general_is_it_rude_to_knowingly_bring_unruly_children_on_a_plane)

dat$rude.baby = ifelse(grepl("not at all", dat$rude.baby), "No", ifelse(grepl("somewhat",  dat$rude.baby), "Somewhat", ifelse(grepl("Yes", dat$rude.baby), "Yes", NA)))

dat$rude.child.unruly = ifelse(grepl("not at all", dat$rude.child.unruly), "No", ifelse(grepl("somewhat",  dat$rude.child.unruly), "Somewhat", ifelse(grepl("Yes", dat$rude.child.unruly), "Yes", NA)))

Exploratory Data Correlations

Examining Responses to questions of interest

p.rude.baby = ggplot(data=dat, aes(x=rude.baby)) + 
      geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have a Baby?", y = "Count")

p.rude.child= ggplot(data=dat, aes(x=rude.child.unruly)) + 
      geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have an Unruly Child?", y = "Count")

plot(p.rude.baby)

plot(p.rude.child)

Examining Demographic Distributions to Questions

Rude to have a baby onboard?

dat$age = factor(dat$age, levels = c("18-29", "30-44", "45-60", "> 60", ""))

p.rude.baby.gender = ggplot(data=dat, aes(x=rude.baby)) + 
      geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have a Baby? (by Gender)", y = "Count") + facet_wrap(~gender,nrow = 2)

p.rude.baby.age = ggplot(data=dat, aes(x=rude.baby)) + 
      geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have a Baby? (by Age)", y = "Count") + facet_wrap(~age, nrow = 2)

p.rude.baby.income= ggplot(data=dat, aes(x=rude.baby)) + 
      geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have a Baby? (by Income)", y = "Count") + facet_wrap(~household_income, nrow = 2)

plot(p.rude.baby.gender)

plot(p.rude.baby.age)

plot(p.rude.baby.income)

Rude to have an unruly child onboard?

p.rude.child.unruly.gender = ggplot(data=dat, aes(x=rude.child.unruly)) + 
      geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have an unruly child? (by Gender)", y = "Count") + facet_wrap(~gender)

p.rude.child.unruly.age = ggplot(data=dat, aes(x=rude.child.unruly)) + geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have an unruly child? (by Age)", y = "Count") + facet_wrap(~age)

p.rude.child.unruly.income= ggplot(data=dat, aes(x=rude.child.unruly)) + geom_bar(aes(y = (..count..))) + theme_minimal() + labs(x = "Rude to Have an unruly child? (by Income)", y = "Count") + facet_wrap(~household_income)

plot(p.rude.child.unruly.gender)

plot(p.rude.child.unruly.age)

plot(p.rude.child.unruly.income)

Conclusions

Based on the visualizations, preliminary results suggest demographics influence attitudes towards traveling with minors. Future steps include an unsupervised analysis which may reveal specific subpopulations. This could create potential for ticket pricing, in which individuals with increased preferences can select seats based on needs.