Multinomial Regression (Unordered):

1.0 Objective

Again, here work on the cleaned FAA data set. The we have used in our prior analysis or Linear Regression and Logistic Regression. Please refer to the articles for understanding the background of data. For convenience data preperation is included in this post as well

Here, we create a multinomial variable:

Normal Landing Y = 1 if distance < 1000
Long Landing Y = 2 if 1000 < = distance < 2500
Risky Landing Y = 3 otherwise

Discard the continuous data for “distance”, and assume we are given this multinomial response only without knowing its order

2.0 Initial Preperation

2.1 Packages Loading

library(readxl)
library(dplyr)
library(tidyverse)
library(stringr)
library(car)
library(VGAM)
library(“nnet”)
library(ggridges)

2.2 Data Load

# Read the sheets, one by one
FAA1 <- read_excel("FAA1.xls")
FAA2 <- read_excel("FAA2.xls")
dim(FAA1)

## [1] 800   8

dim(FAA2)

## [1] 150   7

2.3 Data Preperation

str(FAA1)

## Classes 'tbl_df', 'tbl' and 'data.frame':    800 obs. of  8 variables:
##  $ aircraft    : chr  "boeing" "boeing" "boeing" "boeing" ...
##  $ duration    : num  98.5 125.7 112 196.8 90.1 ...
##  $ no_pasg     : num  53 69 61 56 70 55 54 57 61 56 ...
##  $ speed_ground: num  107.9 101.7 71.1 85.8 59.9 ...
##  $ speed_air   : num  109 103 NA NA NA ...
##  $ height      : num  27.4 27.8 18.6 30.7 32.4 ...
##  $ pitch       : num  4.04 4.12 4.43 3.88 4.03 ...
##  $ distance    : num  3370 2988 1145 1664 1050 ...

str(FAA2)

## Classes 'tbl_df', 'tbl' and 'data.frame':    150 obs. of  7 variables:
##  $ aircraft    : chr  "boeing" "boeing" "boeing" "boeing" ...
##  $ no_pasg     : num  53 69 61 56 70 55 54 57 61 56 ...
##  $ speed_ground: num  107.9 101.7 71.1 85.8 59.9 ...
##  $ speed_air   : num  109 103 NA NA NA ...
##  $ height      : num  27.4 27.8 18.6 30.7 32.4 ...
##  $ pitch       : num  4.04 4.12 4.43 3.88 4.03 ...
##  $ distance    : num  3370 2988 1145 1664 1050 ...

FAA1 : 800 Rows and 8 columns , sample size is 800 rows
FAA2 : 150 Rows and 7 columns ,sample size is 150 rows
FAA2 does not have duration column
FAA1 AND FAA2 have similar structure, datatypes.

Data Merge

FAA2$duration<-NA
FAA_merge<-rbind(FAA1,FAA2)

In order to combine the 2 data sets we need to make the sructure of both same, by structure we refer to columns.Hence we need to add a column duration to FAA2 and then use rbind function to find 150 +800 =950 rows sample size=950

INSIGHTS:

Duration column in FAA2 file is missing
Speed air column has 642 NA Values
duration,distance and height columns have considerable difference between Minimum and Maximum value
height column has a negative value, which cannot be legit because height is bound to be positive.Hence it can be a data reading issue.

3.0 Data Cleaning

3.1 Existing data cleaning

As first steps in Data cleaning we need to filter the rows that do not qualify for data analysis.The are 2 major categories of such line items: * Abnormal values: We need to remove abnormal values(values that do not qualify for analysis, as suggested by SMEs).Below code removes them * NA Rows: We need to remove NA rows(where all cell values are NA).

Duplicate Check

Because the FAA2 has NA in duration column , we need to check for duplicates rows excluding that column. We need to remove the duplicate rows if we find any to reduce redundancy

#FAA_merge %>% distinct()
row_dup<-duplicated(FAA_merge[,-2])
FAA_U<-FAA_merge[!row_dup,]
dim(FAA_U)

## [1] 850   8

dim(FAA_U)

## [1] 850   8

summary(FAA_U)

##    aircraft            duration         no_pasg      speed_ground   
##  Length:850         Min.   : 14.76   Min.   :29.0   Min.   : 27.74  
##  Class :character   1st Qu.:119.49   1st Qu.:55.0   1st Qu.: 65.90  
##  Mode  :character   Median :153.95   Median :60.0   Median : 79.64  
##                     Mean   :154.01   Mean   :60.1   Mean   : 79.45  
##                     3rd Qu.:188.91   3rd Qu.:65.0   3rd Qu.: 92.06  
##                     Max.   :305.62   Max.   :87.0   Max.   :141.22  
##                     NA's   :50                                      
##    speed_air          height           pitch          distance      
##  Min.   : 90.00   Min.   :-3.546   Min.   :2.284   Min.   :  34.08  
##  1st Qu.: 96.25   1st Qu.:23.314   1st Qu.:3.642   1st Qu.: 883.79  
##  Median :101.15   Median :30.093   Median :4.008   Median :1258.09  
##  Mean   :103.80   Mean   :30.144   Mean   :4.009   Mean   :1526.02  
##  3rd Qu.:109.40   3rd Qu.:36.993   3rd Qu.:4.377   3rd Qu.:1936.95  
##  Max.   :141.72   Max.   :59.946   Max.   :5.927   Max.   :6533.05  
##  NA's   :642

Summary Statistics: we find 850 unique rows and 8 columns. Below is the summary statistics of each column in dataset

FAA_U<-FAA_U[FAA_U$duration>=40,]
FAA_U<-FAA_U[FAA_U$speed_ground>=30 && FAA_U$speed_ground<=140,]
FAA_U<-FAA_U[FAA_U$height>=6,]
FAA_U<-FAA_U[FAA_U$distance<=6000,]
dim(FAA_U)

## [1] 833   8

#Remove all rows that have NA in all the columns
FAA_U_RmvNA<-FAA_U %>% filter(!(is.na(aircraft)&is.na(duration)&is.na(no_pasg)&is.na(speed_ground)&is.na(speed_air)&is.na(height)&is.na(pitch)&is.na(distance)))
#Looking for NA Values in columns
paste("Number of NA in AIRCRAFT:" , sum(is.na(FAA_U_RmvNA$aircraft)))

## [1] "Number of NA in AIRCRAFT: 0"

paste("Number of NA in DURATION:" , sum(is.na(FAA_U_RmvNA$duration)))

## [1] "Number of NA in DURATION: 0"

paste("Number of NA in NO_PASG:" , sum(is.na(FAA_U_RmvNA$no_pasg)))

## [1] "Number of NA in NO_PASG: 0"

paste("Number of NA in SPEED GROUND:" , sum(is.na(FAA_U_RmvNA$speed_ground)))

## [1] "Number of NA in SPEED GROUND: 0"

paste("Number of NA in SPEED AIR:" , sum(is.na(FAA_U_RmvNA$speed_air)))

## [1] "Number of NA in SPEED AIR: 588"

paste("Number of NA in HEIGHT:" , sum(is.na(FAA_U_RmvNA$height)))

## [1] "Number of NA in HEIGHT: 0"

paste("Number of NA in PITCH:" , sum(is.na(FAA_U_RmvNA$pitch)))

## [1] "Number of NA in PITCH: 0"

paste("Number of NA in DISTANCE:" , sum(is.na(FAA_U_RmvNA$distance)))

## [1] "Number of NA in DISTANCE: 0"

nrow(FAA_U_RmvNA)

## [1] 783

Conclusion: we removed close to 850 -783= 67 rows. * Abnormal values: 17 * NA VALUE Rows(entire row is missing): 50 * Speed Air: has 588 Missing values

3.2 Categorical Response variable Creation

##Use below formula to create 2 new columns
S1<-FAA_U_RmvNA%>%
    filter(FAA_U_RmvNA$distance<1000)%>%
    mutate(Y =1)
S2<-FAA_U_RmvNA%>%
    filter(distance>=1000 & distance<2500)%>%
    mutate(Y =2)
S3<-FAA_U_RmvNA%>%
    filter(distance>=2500)%>%
    mutate(Y =3)
FAAM<-rbind(S1,S2,S3)[,-8]

dim(FAAM)

## [1] 783   8

head(FAAM)

## # A tibble: 6 x 8
##   aircraft duration no_pasg speed_ground speed_air height pitch     Y
##   <chr>       <dbl>   <dbl>        <dbl>     <dbl>  <dbl> <dbl> <dbl>
## 1 boeing       73.0      54         54.4        NA   24.0  3.84     1
## 2 boeing       52.9      57         57.1        NA   19.4  4.64     1
## 3 boeing      166.       69         48.8        NA   31.2  3.90     1
## 4 boeing      180.       66         63.7        NA   19.6  4.29     1
## 5 boeing      228.       78         61.2        NA   21.8  4.60     1
## 6 boeing      201.       71         41.5        NA   28.5  4.60     1

4.0 Modelling

4.1 Data prepertion for modelling

We have 7 predictors and one Dependent response. The respnse is multinomial with values 1, 2, 3 based on the actual landing distance.

Converting Y response to factor

FAAM$Y<-as.factor(FAAM$Y)

Analysis from previous project

Because we already know that speed air and speed ground have Multicollinearity choosing either one will be sensible idea. Here we choose speed ground on the basis that it has no missing values whereas speed ground many 588 of those.

Summary Statistics

summary(FAAM)

##    aircraft            duration         no_pasg       speed_ground   
##  Length:783         Min.   : 41.95   Min.   :29.00   Min.   : 27.74  
##  Class :character   1st Qu.:119.67   1st Qu.:55.00   1st Qu.: 66.01  
##  Mode  :character   Median :154.28   Median :60.00   Median : 79.75  
##                     Mean   :154.83   Mean   :60.07   Mean   : 79.51  
##                     3rd Qu.:189.75   3rd Qu.:65.00   3rd Qu.: 92.13  
##                     Max.   :305.62   Max.   :87.00   Max.   :132.78  
##                                                                      
##    speed_air          height           pitch       Y      
##  Min.   : 90.00   Min.   : 6.228   Min.   :2.284   1:245  
##  1st Qu.: 96.15   1st Qu.:23.562   1st Qu.:3.654   2:438  
##  Median :100.89   Median :30.203   Median :4.017   3:100  
##  Mean   :103.50   Mean   :30.438   Mean   :4.015          
##  3rd Qu.:109.42   3rd Qu.:36.984   3rd Qu.:4.385          
##  Max.   :132.91   Max.   :59.946   Max.   :5.927          
##  NA's   :588

Removing Speed Air variable

FAAM_CL<-FAAM[,-5]
str(FAAM_CL)

## Classes 'tbl_df', 'tbl' and 'data.frame':    783 obs. of  7 variables:
##  $ aircraft    : chr  "boeing" "boeing" "boeing" "boeing" ...
##  $ duration    : num  73 52.9 165.5 179.9 228.2 ...
##  $ no_pasg     : num  54 57 69 66 78 71 66 60 75 71 ...
##  $ speed_ground: num  54.4 57.1 48.8 63.7 61.2 ...
##  $ height      : num  24 19.4 31.2 19.6 21.8 ...
##  $ pitch       : num  3.84 4.64 3.9 4.29 4.6 ...
##  $ Y           : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...

4.2 Single factor Analysis

Aircraft

nmod.s<-multinom(Y ~aircraft, family=cumulative(parallel=TRUE),FAAM_CL)

## # weights:  9 (4 variable)
## initial  value 860.213422 
## iter  10 value 725.056889
## final  value 725.056861 
## converged

int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("aircraft : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')

## aircraft : p2/p1 is significant

cat("aircraft : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))

## aircraft : p3/p1 is significant

Duration

nmod.s<-multinom(Y ~duration, family=cumulative(parallel=TRUE),FAAM_CL)

## # weights:  9 (4 variable)
## initial  value 860.213422 
## final  value 742.088196 
## converged

int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("duration : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')

## duration : p2/p1 is insignificant

cat("duration : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))

## duration : p3/p1 is insignificant

no pasg

nmod.s<-multinom(Y ~no_pasg, family=cumulative(parallel=TRUE),FAAM_CL)

## # weights:  9 (4 variable)
## initial  value 860.213422 
## final  value 744.740361 
## converged

int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("no pasg : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')

## no pasg : p2/p1 is insignificant

cat("no pasg : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))

## no pasg : p3/p1 is insignificant

speed ground

nmod.s<-multinom(Y ~speed_ground, family=cumulative(parallel=TRUE),FAAM_CL)

## # weights:  9 (4 variable)
## initial  value 860.213422 
## iter  10 value 362.508426
## iter  20 value 348.890071
## final  value 348.889729 
## converged

int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("speed_ground : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')

## speed_ground : p2/p1 is significant

cat("speed_ground : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))

## speed_ground : p3/p1 is significant

height

nmod.s<-multinom(Y ~height, family=cumulative(parallel=TRUE),FAAM_CL)

## # weights:  9 (4 variable)
## initial  value 860.213422 
## final  value 732.380913 
## converged

int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("height : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')

## height : p2/p1 is significant

cat("height : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))

## height : p3/p1 is significant

pitch

nmod.s<-multinom(Y ~pitch, family=cumulative(parallel=TRUE),FAAM_CL)

## # weights:  9 (4 variable)
## initial  value 860.213422 
## final  value 743.211297 
## converged

int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("pitch : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')

## pitch : p2/p1 is insignificant

cat("pitch : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))

## pitch : p3/p1 is significant

Observation

Single factor analysis tells us that Aircraft, speed ground and height are significant. Pitch is a little ambigious because its insignificant for first case and significant for the other.

4.3 Analysis based on Visualization

Distribution of Y

FAAM_CL%>%
group_by(Y)%>%
summarise(count=n()) %>%
ggplot(aes(x=Y,y = count)) +geom_bar(stat = "identity",fill='#3990E5' ,alpha=0.5)+
 labs(x = "Landing Category", 
       y = "count",
       title = "Number observations per Category",
       subtitle = "Segrated by types")

Ploting significant predictors AIRCRAFT

ggplot(FAAM_CL, aes(factor(aircraft), ..count..)) + 
  geom_bar(aes(fill = as.factor(Y), alpha=0.5), position = "dodge")+
  labs(x = "Aircraft Category", 
       y = "count",
       title = "Number observations per Category",
       subtitle = "Segrated by types of aircraft")

Observation * For both air bus and boeing there are number of category 2 landings are highest. * Airbus has close to 30 landings for category 3

Height and speed ground

theme_set(theme_ridges())

ggplot(FAAM_CL, aes(x = speed_ground, y = Y)) +
  geom_density_ridges(aes(fill = Y,alpha=0.5)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

## Picking joint bandwidth of 3.07

ggplot(FAAM_CL, aes(x = height, y = Y)) +
  geom_density_ridges(aes(fill = Y,alpha=0.5)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

## Picking joint bandwidth of 2.97

Observation Height: For category 3 height has a wider distribution Speed ground: As the category increases the mean of distribution also increases. Although the variance does not change much

4.4 Model Fitting: Full

nmod<-multinom(Y ~.,FAAM_CL)

## # weights:  24 (14 variable)
## initial  value 860.213422 
## iter  10 value 535.322398
## iter  20 value 233.215870
## iter  30 value 215.490327
## iter  40 value 215.306686
## iter  50 value 215.095098
## final  value 215.033762 
## converged

int1<-summary(nmod)
int1

## Call:
## multinom(formula = Y ~ ., data = FAAM_CL)
## 
## Coefficients:
##   (Intercept) aircraftboeing     duration     no_pasg speed_ground
## 2   -17.77161       3.739129 -0.003020479 -0.02072378    0.2177972
## 3  -132.59919       8.718948  0.002345470 -0.01323350    1.1968791
##      height      pitch
## 2 0.1397391 -0.3375208
## 3 0.3741067  0.9454726
## 
## Std. Errors:
##   (Intercept) aircraftboeing    duration    no_pasg speed_ground
## 2   2.1293610      0.4003410 0.002665208 0.01719359   0.01780560
## 3   0.0357626      0.8685735 0.008056256 0.05805919   0.04043594
##       height     pitch
## 2 0.01706753 0.2664874
## 3 0.04868468 0.7621497
## 
## Residual Deviance: 430.0675 
## AIC: 458.0675

Significant Predictors

pnorm(int1$coefficients/int1$standard.errors,lower.tail = FALSE)

##   (Intercept) aircraftboeing  duration   no_pasg  speed_ground
## 2           1   4.823060e-21 0.8714557 0.8859602  1.049188e-34
## 3           1   5.175255e-24 0.3854735 0.5901501 7.608276e-193
##         height     pitch
## 2 1.334405e-16 0.8973427
## 3 7.692985e-15 0.1073890

Observation Here we observe that Aircraft, Speed ground,height are significant. Here there can we 2 ways of finding these significant predictors * The coefficients that are greater that 2 standard error * Calculate p-value of test statistic using the pnorm function.

In both ways we observe that Aircraft, Speed ground,height are significant Lets, confirm this using step AIC(backward)

4.5 Variable selection using step AIC: Backward

nmodAIC<-step(nmod)

## Start:  AIC=458.07
## Y ~ aircraft + duration + no_pasg + speed_ground + height + pitch
## 
## trying - aircraft 
## # weights:  21 (12 variable)
## initial  value 860.213422 
## iter  10 value 553.236136
## iter  20 value 307.585472
## iter  30 value 302.304626
## iter  40 value 302.277717
## final  value 302.272056 
## converged
## trying - duration 
## # weights:  21 (12 variable)
## initial  value 860.213422 
## iter  10 value 442.447072
## iter  20 value 232.139392
## iter  30 value 217.315099
## iter  40 value 216.591666
## iter  50 value 215.924741
## iter  50 value 215.924741
## iter  50 value 215.924741
## final  value 215.924741 
## converged
## trying - no_pasg 
## # weights:  21 (12 variable)
## initial  value 860.213422 
## iter  10 value 571.600465
## iter  20 value 233.289377
## iter  30 value 220.332912
## iter  40 value 218.044063
## final  value 215.778181 
## converged
## trying - speed_ground 
## # weights:  21 (12 variable)
## initial  value 860.213422 
## iter  10 value 719.414827
## final  value 707.891766 
## converged
## trying - height 
## # weights:  21 (12 variable)
## initial  value 860.213422 
## iter  10 value 501.973957
## iter  20 value 279.931051
## iter  30 value 272.823455
## iter  40 value 272.747843
## final  value 272.723559 
## converged
## trying - pitch 
## # weights:  21 (12 variable)
## initial  value 860.213422 
## iter  10 value 537.131697
## iter  20 value 232.570572
## iter  30 value 218.874317
## iter  40 value 218.164157
## final  value 217.133536 
## converged
##                Df       AIC
## - no_pasg      12  455.5564
## - duration     12  455.8495
## <none>         14  458.0675
## - pitch        12  458.2671
## - height       12  569.4471
## - aircraft     12  628.5441
## - speed_ground 12 1439.7835
## # weights:  21 (12 variable)
## initial  value 860.213422 
## iter  10 value 571.600465
## iter  20 value 233.289377
## iter  30 value 220.332912
## iter  40 value 218.044063
## final  value 215.778181 
## converged
## 
## Step:  AIC=455.56
## Y ~ aircraft + duration + speed_ground + height + pitch
## 
## trying - aircraft 
## # weights:  18 (10 variable)
## initial  value 860.213422 
## iter  10 value 461.131666
## iter  20 value 304.481071
## iter  30 value 303.053308
## iter  40 value 302.957518
## final  value 302.957485 
## converged
## trying - duration 
## # weights:  18 (10 variable)
## initial  value 860.213422 
## iter  10 value 454.092567
## iter  20 value 231.918994
## iter  30 value 221.564842
## iter  40 value 216.658288
## final  value 216.655416 
## converged
## trying - speed_ground 
## # weights:  18 (10 variable)
## initial  value 860.213422 
## iter  10 value 711.480155
## final  value 708.091684 
## converged
## trying - height 
## # weights:  18 (10 variable)
## initial  value 860.213422 
## iter  10 value 454.702852
## iter  20 value 276.512160
## iter  30 value 273.463355
## iter  40 value 273.372814
## iter  40 value 273.372812
## iter  40 value 273.372812
## final  value 273.372812 
## converged
## trying - pitch 
## # weights:  18 (10 variable)
## initial  value 860.213422 
## iter  10 value 439.964575
## iter  20 value 229.865728
## iter  30 value 219.743502
## iter  40 value 217.891884
## final  value 217.863332 
## converged
##                Df       AIC
## - duration     10  453.3108
## <none>         12  455.5564
## - pitch        10  455.7267
## - height       10  566.7456
## - aircraft     10  625.9150
## - speed_ground 10 1436.1834
## # weights:  18 (10 variable)
## initial  value 860.213422 
## iter  10 value 454.092567
## iter  20 value 231.918994
## iter  30 value 221.564842
## iter  40 value 216.658288
## final  value 216.655416 
## converged
## 
## Step:  AIC=453.31
## Y ~ aircraft + speed_ground + height + pitch
## 
## trying - aircraft 
## # weights:  15 (8 variable)
## initial  value 860.213422 
## iter  10 value 357.926020
## iter  20 value 304.349610
## iter  30 value 304.090286
## final  value 304.071791 
## converged
## trying - speed_ground 
## # weights:  15 (8 variable)
## initial  value 860.213422 
## iter  10 value 711.281659
## final  value 710.666324 
## converged
## trying - height 
## # weights:  15 (8 variable)
## initial  value 860.213422 
## iter  10 value 330.985544
## iter  20 value 277.196122
## iter  30 value 275.212444
## final  value 275.046753 
## converged
## trying - pitch 
## # weights:  15 (8 variable)
## initial  value 860.213422 
## iter  10 value 352.446469
## iter  20 value 233.419805
## iter  30 value 222.087361
## final  value 218.834648 
## converged
##                Df       AIC
## <none>         10  453.3108
## - pitch         8  453.6693
## - height        8  566.0935
## - aircraft      8  624.1436
## - speed_ground  8 1437.3326

AIC(nmodAIC)

## [1] 453.3108

AIC(nmod)

## [1] 458.0675

Observation Here we see that an additional variable pitch is considered significant. The AIC is further reduced to 453.3108 from 458.0675. Lets comapre the 2 models using Chisq test.

deviance(nmodAIC)-deviance(nmod)

## [1] 3.243309

nmod$edf-nmodAIC$edf

## [1] 4

pchisq(deviance(nmodAIC)-deviance(nmod),nmod$edf-nmodAIC$edf,lower=F)

## [1] 0.5179643

Interpretation Pvalue from chisq test suggests that both the models are same. As pvalue>0.05 hence we cannot reject the null hypothesis that there is no difference between the 2 models.

AIC model adds a predictor to the set of significant predictors but improves the prediction ability we here choose the model suggested with one less predictor inorder to find a simpler model.

5.0 Reporting

int1<-summary(nmod)
a<-int1$coefficients
s<-int1$standard.errors
pv<-ifelse(pnorm(int1$coefficients/int1$standard.errors,lower.tail = FALSE)<0.05,"Sign","notSign")
d<-data.frame(cbind(round(a[1:7],3),round(a[2:7],3),exp(a[1:7]),exp(a[2:7]),round(s[1:7],3),round(s[2:7],3),pv[1,],pv[2,]))

## Warning in cbind(round(a[1:7], 3), round(a[2:7], 3), exp(a[1:7]),
## exp(a[2:7]), : number of rows of result is not a multiple of vector length
## (arg 2)

colnames(d)<-c("log(p2/p1)","log(p3/p1)","expB:P2/P1","expB:P3/P1","se:p2/p1","se:p3/p1","pvalue:p2/p1","pvalue:p3/p1")
d

##                log(p2/p1) log(p3/p1)          expB:P2/P1
## (Intercept)       -17.772   -132.599   1.91376727762e-08
## aircraftboeing   -132.599      3.739 2.5876401450654e-58
## duration            3.739      8.719    42.0613445885537
## no_pasg             8.719     -0.003    6117.74157538977
## speed_ground       -0.003      0.002   0.996984077769753
## height              0.002     -0.021    1.00234822256344
## pitch              -0.021   -132.599   0.979489482985982
##                         expB:P3/P1 se:p2/p1 se:p3/p1 pvalue:p2/p1
## (Intercept)    2.5876401450654e-58    2.129    0.036      notSign
## aircraftboeing    42.0613445885537    0.036      0.4         Sign
## duration          6117.74157538977      0.4    0.869      notSign
## no_pasg          0.996984077769753    0.869    0.003      notSign
## speed_ground      1.00234822256344    0.003    0.008         Sign
## height           0.979489482985982    0.008    0.017         Sign
## pitch          2.5876401450654e-58    0.017    0.036      notSign
##                pvalue:p3/p1
## (Intercept)         notSign
## aircraftboeing         Sign
## duration            notSign
## no_pasg             notSign
## speed_ground           Sign
## height                 Sign
## pitch               notSign

Model Assessment

pred<-data.frame(predict(nmod, type="probs"))
pred$prediction<-ifelse(pred$X1>pred$X2,ifelse(pred$X1>pred$X3,'1','3'),ifelse(pred$X2>pred$X3,'2','3'))
pred$real<-FAAM_CL$Y
table(pred$real,pred$prediction,dnn=c('real','prediction'))

##     prediction
## real   1   2   3
##    1 207  38   0
##    2  35 398   5
##    3   0   6  94

mcr<-mean(pred$real!=pred$prediction)
mcr

## [1] 0.1072797

Interpretation

In the above table ‘pvalue:px/p1’ column denotes whether that predictor is significant or not. “Sign” indicates that the predictor is significant and vice-versa.

Upon regressing the full model we see that only 3 predictors out of 6 are significant, that are Aircraft(Boeing), height, speed ground.
We see that intercept values are also not significant.

Lets consider the different landings as Normal , long , risky correspoding to Y = 1, 2, 3

AIRCRAFT: Here, we see that log odds(log(p2/p1)) linearly decrease by -132 when aircraft = boeing but log odds(p3/p1) linearly increase for by 3.7 when aircraft=boeing. This means that aircraft plays a less significant role in first category(p2/p1) but a more important role in second category(p3/p1). Simply stated using boeing aircraft increases odds of landing to be risky
Speed ground : We observe that per unit increase in value of speed ground increase the odds of landing to be long as well as risky.
Height : We observe that per unit increase in value of Height increase the odds(chances) of landing to be long as well as risky.

In a simple way It can be interpreted as if the aircraft is boeing the per unit increase in speed ground and height increases the probility for the Landing distance to be greater than 2500 units.

Using predict function over our model gives us the Miss classification rate. The digonal values are the ones that are predicted correctly whereas non-digonal values are the ones predicted wrong. Overall all miss classification rate is 11 %, which is satisfactory.

Incase of ordinal data we may need to use another modelling mechanism.. The function that can be used in vglm from VGAM package.

Thank-you !

FAA DATA Multinomial CaseStudy

Vidhi Rathod

01 March 2020