## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:plotrix':
## 
##     rescale
## 
## Welcome to CUNY IS606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='IS606') will list the demos that are available.
## 
## Attaching package: 'IS606'
## 
## The following object is masked from 'package:utils':
## 
##     demo
##        X                        recline           recliner      gender   
##  Min.   :  1.0   Always             :132   less likely:536   Female:442  
##  1st Qu.:211.5   Usually            :175   likely     :307   Male  :401  
##  Median :422.0   About half the time:116                                 
##  Mean   :422.0   Once in a while    :254                                 
##  3rd Qu.:632.5   Never              :166                                 
##  Max.   :843.0                                                           
##                                                                          
##   totalinches       age                    location  
##  Min.   :59.00   > 60 :215   Pacific           :185  
##  1st Qu.:64.00   18-29:172   South Atlantic    :138  
##  Median :67.00   30-44:222   East North Central:122  
##  Mean   :67.43   45-60:234   Middle Atlantic   :111  
##  3rd Qu.:70.00               West South Central: 81  
##  Max.   :79.00               West North Central: 66  
##                              (Other)           :140  
##                             education             gendheight 
##                                  :  6   Not Tall Female:269  
##  Bachelor degree                 :279   Not Tall Male  :214  
##  Graduate degree                 :265   Tall Female    :173  
##  High school degree              : 61   Tall Male      :187  
##  Less than high school degree    :  6                        
##  Some college or Associate degree:226                        
## 

Part 1 - Introduction:

Airlines are continuously finding new ways to charge their customers and if you don’t pay for extra legroom, you’ll be stuck with cramped seats. If you’re one of the unlucky fliers sitting behind a passenger who reclines their seat, you’ll be left with even less room to eat and maneuver your laptop. But, who are the passengers who are most likely to recline their seat? Are they tall? This project aims to study if passenger inclincation to recline their seat is associated with passenger height.

Research Question: Are people who more often recline their seat back (Always + Usually) more likely to be taller than average?

Ho: There is no difference in average height by gender among passengers who are likely or less likely to recline their seats. u(likely) - u(less likely) = 0

Ha: There is some difference in average height by gender among passengers who are likely or less likely to recline their seats. u(likely) - u(less likely) =/= 0

Part 2 - Data:

Data Collection: A survey of SurveyMonkey Audience readers. Aug. 29 and 30, 2014 Survey was commissioned by FiveThirtyEight for the story 41 Percent of Fliers Say It’s Rude To Recline Your Airplane Seat

Cases: n=1040 survey respondents, n=874 of whom had flown, n=843 answered all 3 questions

#Sample demographics: 

#Gender and age
tidyrider %>%
  group_by(gender, age) %>%
  tally(sort = TRUE)
## Source: local data frame [8 x 3]
## Groups: gender [2]
## 
##   gender    age     n
##   (fctr) (fctr) (int)
## 1 Female  45-60   122
## 2 Female   > 60   116
## 3 Female  30-44   111
## 4 Female  18-29    93
## 5   Male  45-60   112
## 6   Male  30-44   111
## 7   Male   > 60    99
## 8   Male  18-29    79
#US region
tidyrider %>%
  group_by(location) %>%
  tally(sort = TRUE)
## Source: local data frame [10 x 2]
## 
##              location     n
##                (fctr) (int)
## 1             Pacific   185
## 2      South Atlantic   138
## 3  East North Central   122
## 4     Middle Atlantic   111
## 5  West South Central    81
## 6  West North Central    66
## 7         New England    55
## 8            Mountain    54
## 9  East South Central    25
## 10                        6

Variables:

Response variable: Reclinicity, a categorical variable defined as inclincation to recline airplane seat Those with higher reclinicity are defined as those who say they always or usually recline their airplane seats.

Explanatory variable: Height, a numerical and discrete variable defined as height in total inches

Type of Study: This is an observational study, which sampled past airplane passengers to draw inferences about the full population of fliers. The independent variable is not under the control of the researcher.

Scope of inference - generalizability: The population of interest is likely American airplane passengers, and the findings of the analysis can mostly be generalized to that population since the survey uses an adequate sample size and takes into account the representative underlying demographics.

The most likely potential for bias in the survey instrument in that some respondents may be less likely to admit to ostensibly rude behvaior.

Scope of inference - causality: Since this is an observational study, we can only identify associations between the variables.

Part 3 - Exploratory data analysis:

#Summary statistics
summary(tidyrider)
##        X                        recline           recliner      gender   
##  Min.   :  1.0   Always             :132   less likely:536   Female:442  
##  1st Qu.:211.5   Usually            :175   likely     :307   Male  :401  
##  Median :422.0   About half the time:116                                 
##  Mean   :422.0   Once in a while    :254                                 
##  3rd Qu.:632.5   Never              :166                                 
##  Max.   :843.0                                                           
##                                                                          
##   totalinches       age                    location  
##  Min.   :59.00   > 60 :215   Pacific           :185  
##  1st Qu.:64.00   18-29:172   South Atlantic    :138  
##  Median :67.00   30-44:222   East North Central:122  
##  Mean   :67.43   45-60:234   Middle Atlantic   :111  
##  3rd Qu.:70.00               West South Central: 81  
##  Max.   :79.00               West North Central: 66  
##                              (Other)           :140  
##                             education             gendheight 
##                                  :  6   Not Tall Female:269  
##  Bachelor degree                 :279   Not Tall Male  :214  
##  Graduate degree                 :265   Tall Female    :173  
##  High school degree              : 61   Tall Male      :187  
##  Less than high school degree    :  6                        
##  Some college or Associate degree:226                        
## 
boxplot(totalinches~recline, data = tidyrider)
#"Once in a while" has the highest mean height, with "about half the time" the lowest. "Always" has the largest IQR. 

mosaicplot(recline~gendheight, data = tidyrider)
#When we split gender by relative tallness -- women over 5'6" (average US female height is 5'5") and men over 5'11" (average US male height is 5'10") -- it appears that taller men are slightly more likely to recline their seat

#Men are taller on average, height is fairly normal with no obvious skew
ggplot(tidyrider, aes(totalinches)) + geom_bar(binwidth = .5) + facet_wrap(~ gender)

#No obvious skew by likely to recline
ggplot(tidyrider, aes(totalinches)) + geom_bar(binwidth = .5) + facet_wrap(~ recliner)

#Average height by likely to recline
tidyrider %>%
  group_by(gender, recliner) %>%
  summarize(avgHeight = mean(totalinches))
## Source: local data frame [4 x 3]
## Groups: gender [?]
## 
##   gender    recliner avgHeight
##   (fctr)      (fctr)     (dbl)
## 1 Female less likely  64.85409
## 2 Female      likely  64.83851
## 3   Male less likely  70.16471
## 4   Male      likely  70.47260
mosaicplot(recliner~gendheight, data = tidyrider)
#Tall males slightly more often than tall females appear to always or usually recline their seats.

#Average total height by gender
tidyrider %>%
  group_by(gender) %>%
  summarize(avgHeight = mean(totalinches))
## Source: local data frame [2 x 2]
## 
##   gender avgHeight
##   (fctr)     (dbl)
## 1 Female  64.84842
## 2   Male  70.27681
#Average height by likely to recline, some difference between men's groups
tidyrider %>%
  group_by(gender, recliner) %>%
  summarize(avgHeight = mean(totalinches))
## Source: local data frame [4 x 3]
## Groups: gender [?]
## 
##   gender    recliner avgHeight
##   (fctr)      (fctr)     (dbl)
## 1 Female less likely  64.85409
## 2 Female      likely  64.83851
## 3   Male less likely  70.16471
## 4   Male      likely  70.47260
#Sample Standard error of average height by gender
tidyrider %>%
  group_by(gender) %>%
  summarize(std.error(totalinches))
## Source: local data frame [2 x 2]
## 
##   gender std.error(totalinches)
##   (fctr)                  (dbl)
## 1 Female              0.1330168
## 2   Male              0.1506485
#Sample Standard error of average height and reclined
tidyrider %>%
  group_by(gender, recliner) %>%
  summarize(std.error(totalinches))
## Source: local data frame [4 x 3]
## Groups: gender [?]
## 
##   gender    recliner std.error(totalinches)
##   (fctr)      (fctr)                  (dbl)
## 1 Female less likely              0.1651907
## 2 Female      likely              0.2248016
## 3   Male less likely              0.1918955
## 4   Male      likely              0.2426424

Part 4 - Inference:

#Check conditions
#The sample at n=874 is less than 10% of the total population, and is randomly selected and therefore the observations are independent. 

#Each variable has more than 30 observations:

#Sample by Men above and below 5'10"
tidyrider %>%
  filter(gender == "Male") %>%
  group_by(totalinches >'70') %>%
  tally(sort = TRUE)
## Source: local data frame [2 x 2]
## 
##   totalinches > "70"     n
##                (lgl) (int)
## 1              FALSE   214
## 2               TRUE   187
#Sample by Women above and below 5'5"
tidyrider %>%
  filter(gender == "Female") %>%
  group_by(totalinches > '65') %>%
  tally(sort = TRUE)
## Source: local data frame [2 x 2]
## 
##   totalinches > "65"     n
##                (lgl) (int)
## 1              FALSE   269
## 2               TRUE   173
#The data appears normal with no obvious skews:
#Males
ggplot(tidyrider[tidyrider$gender == "Male",], aes(totalinches)) + geom_bar(binwidth = .5) + facet_wrap(~ recliner)

#Females
ggplot(tidyrider[tidyrider$gender == "Female",], aes(totalinches)) + geom_bar(binwidth = .5) + facet_wrap(~ recliner) 

#Inference

Hypothesis test at 90% confdience, theoretical: men

inference(y = tidyrider$totalinches[tidyrider$gender == "Male"], x = tidyrider$recliner[tidyrider$gender == "Male"], est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_less likely = 255, mean_less likely = 70.1647, sd_less likely = 3.0643
## n_likely = 146, mean_likely = 70.4726, sd_likely = 2.9319
## Observed difference between means (less likely-likely) = -0.3079
## 
## H0: mu_less likely - mu_likely = 0 
## HA: mu_less likely - mu_likely != 0 
## Standard error = 0.309 
## Test statistic: Z =  -0.995 
## p-value =  0.3196

Likely to recline male passengers have a mean height of 70.47 with a standard deviation of 2.77, which is .3079 more than the 70.16 average height, with a 2.85 standard deviation, of passengers less likely to recline. Using a hypothesis test with confidence at the 90% level, the p-value of .3196 is larger than the significance value, .10, so we fail to reject the null hypotethesis. There is insufficient evidence to say that there is a difference in average weight of male passengers who are likely to recline their seats and those not likely to recline their seats.

Hypothesis test, theoretical: women

inference(y = tidyrider$totalinches[tidyrider$gender == "Female"], x = tidyrider$recliner[tidyrider$gender == "Female"], est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_less likely = 281, mean_less likely = 64.8541, sd_less likely = 2.7691
## n_likely = 161, mean_likely = 64.8385, sd_likely = 2.8524
## Observed difference between means (less likely-likely) = 0.0156
## 
## H0: mu_less likely - mu_likely = 0 
## HA: mu_less likely - mu_likely != 0 
## Standard error = 0.279 
## Test statistic: Z =  0.056 
## p-value =  0.9554

Likely to recline female passengers have a mean height of 64.8 (2.77 SD), which is .0156 more than the 64.8385 average height (2.85 SD) of passengers less likely to recline. Using a hypothesis test with confidence at the 90% level, the p-value of .9554 is much larger than the significance value, .10, so we fail to reject the null hypotethesis. There is insufficient evidence to say that there is a difference in average weight of female passengers who are likely to recline their seats and those not likely to recline their seats.

Confidence interval: men

inference(y=tidyrider$totalinches[tidyrider$gender == "Male"], x = tidyrider$recliner[tidyrider$gender == "Male"], est = "mean", type = "ci", null = 0, alternative = "twosided", method = "theoretical", conflevel = 0.90)
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_less likely = 255, mean_less likely = 70.1647, sd_less likely = 3.0643
## n_likely = 146, mean_likely = 70.4726, sd_likely = 2.9319
## Observed difference between means (less likely-likely) = -0.3079
## 
## Standard error = 0.3094 
## 90 % Confidence interval = ( -0.8167 , 0.2009 )

Using the t-distribution to model the difference of the two sample means, we can can 90% confident that there is a height difference of -.20 to .82 inches between male passengers likely and less likely to recline their seat.

Confidence interval: women

inference(y = tidyrider$totalinches[tidyrider$gender == "Female"], x = tidyrider$recliner[tidyrider$gender == "Female"], est = "mean", type = "ci", null = 0, alternative = "twosided", method = "theoretical", conflevel = 0.90)
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_less likely = 281, mean_less likely = 64.8541, sd_less likely = 2.7691
## n_likely = 161, mean_likely = 64.8385, sd_likely = 2.8524
## Observed difference between means (less likely-likely) = 0.0156
## 
## Standard error = 0.279 
## 90 % Confidence interval = ( -0.4433 , 0.4744 )

Using the t-distribution to model the difference of the two sample means, we can be 90% confident that there is a height difference of -.47 to .44 inches between female passengers likely and less likely to recline their seat.

Part 5 - Conclusion:

The study shows that we did not find strong evidence to show an association between passenger height by gender and their likeliness to recline their airplane seat.

The hypothesis test for both men and women failed to show a notable difference in liklihood to recline by height, and the associated confidence intervals were small in width, and hovered close to 0.

We learned that passenger height does not significantly factor into their inclination to recline their seat.

Further research should look into other factors, such location, education, and rudeness scores aggregated from other survey questions. Testing correlations and using a regression model may help to find the strongest indicators leading to a passengers inclination to recline their seat.