Again, here work on the cleaned FAA data set. The we have used in our prior analysis or Linear Regression and Logistic Regression. Please refer to the articles for understanding the background of data. For convenience data preperation is included in this post as well
Here, we create a multinomial variable:
Discard the continuous data for “distance”, and assume we are given this multinomial response only without knowing its order
# Read the sheets, one by one
FAA1 <- read_excel("FAA1.xls")
FAA2 <- read_excel("FAA2.xls")
dim(FAA1)
## [1] 800 8
dim(FAA2)
## [1] 150 7
str(FAA1)
## Classes 'tbl_df', 'tbl' and 'data.frame': 800 obs. of 8 variables:
## $ aircraft : chr "boeing" "boeing" "boeing" "boeing" ...
## $ duration : num 98.5 125.7 112 196.8 90.1 ...
## $ no_pasg : num 53 69 61 56 70 55 54 57 61 56 ...
## $ speed_ground: num 107.9 101.7 71.1 85.8 59.9 ...
## $ speed_air : num 109 103 NA NA NA ...
## $ height : num 27.4 27.8 18.6 30.7 32.4 ...
## $ pitch : num 4.04 4.12 4.43 3.88 4.03 ...
## $ distance : num 3370 2988 1145 1664 1050 ...
str(FAA2)
## Classes 'tbl_df', 'tbl' and 'data.frame': 150 obs. of 7 variables:
## $ aircraft : chr "boeing" "boeing" "boeing" "boeing" ...
## $ no_pasg : num 53 69 61 56 70 55 54 57 61 56 ...
## $ speed_ground: num 107.9 101.7 71.1 85.8 59.9 ...
## $ speed_air : num 109 103 NA NA NA ...
## $ height : num 27.4 27.8 18.6 30.7 32.4 ...
## $ pitch : num 4.04 4.12 4.43 3.88 4.03 ...
## $ distance : num 3370 2988 1145 1664 1050 ...
FAA1 : 800 Rows and 8 columns , sample size is 800 rows
FAA2 : 150 Rows and 7 columns ,sample size is 150 rows
FAA2 does not have duration column
FAA1 AND FAA2 have similar structure, datatypes.
Data Merge
FAA2$duration<-NA
FAA_merge<-rbind(FAA1,FAA2)
In order to combine the 2 data sets we need to make the sructure of both same, by structure we refer to columns.Hence we need to add a column duration to FAA2 and then use rbind function to find 150 +800 =950 rows sample size=950
INSIGHTS:
As first steps in Data cleaning we need to filter the rows that do not qualify for data analysis.The are 2 major categories of such line items: * Abnormal values: We need to remove abnormal values(values that do not qualify for analysis, as suggested by SMEs).Below code removes them * NA Rows: We need to remove NA rows(where all cell values are NA).
Duplicate Check
Because the FAA2 has NA in duration column , we need to check for duplicates rows excluding that column. We need to remove the duplicate rows if we find any to reduce redundancy
#FAA_merge %>% distinct()
row_dup<-duplicated(FAA_merge[,-2])
FAA_U<-FAA_merge[!row_dup,]
dim(FAA_U)
## [1] 850 8
dim(FAA_U)
## [1] 850 8
summary(FAA_U)
## aircraft duration no_pasg speed_ground
## Length:850 Min. : 14.76 Min. :29.0 Min. : 27.74
## Class :character 1st Qu.:119.49 1st Qu.:55.0 1st Qu.: 65.90
## Mode :character Median :153.95 Median :60.0 Median : 79.64
## Mean :154.01 Mean :60.1 Mean : 79.45
## 3rd Qu.:188.91 3rd Qu.:65.0 3rd Qu.: 92.06
## Max. :305.62 Max. :87.0 Max. :141.22
## NA's :50
## speed_air height pitch distance
## Min. : 90.00 Min. :-3.546 Min. :2.284 Min. : 34.08
## 1st Qu.: 96.25 1st Qu.:23.314 1st Qu.:3.642 1st Qu.: 883.79
## Median :101.15 Median :30.093 Median :4.008 Median :1258.09
## Mean :103.80 Mean :30.144 Mean :4.009 Mean :1526.02
## 3rd Qu.:109.40 3rd Qu.:36.993 3rd Qu.:4.377 3rd Qu.:1936.95
## Max. :141.72 Max. :59.946 Max. :5.927 Max. :6533.05
## NA's :642
Summary Statistics: we find 850 unique rows and 8 columns. Below is the summary statistics of each column in dataset
FAA_U<-FAA_U[FAA_U$duration>=40,]
FAA_U<-FAA_U[FAA_U$speed_ground>=30 && FAA_U$speed_ground<=140,]
FAA_U<-FAA_U[FAA_U$height>=6,]
FAA_U<-FAA_U[FAA_U$distance<=6000,]
dim(FAA_U)
## [1] 833 8
#Remove all rows that have NA in all the columns
FAA_U_RmvNA<-FAA_U %>% filter(!(is.na(aircraft)&is.na(duration)&is.na(no_pasg)&is.na(speed_ground)&is.na(speed_air)&is.na(height)&is.na(pitch)&is.na(distance)))
#Looking for NA Values in columns
paste("Number of NA in AIRCRAFT:" , sum(is.na(FAA_U_RmvNA$aircraft)))
## [1] "Number of NA in AIRCRAFT: 0"
paste("Number of NA in DURATION:" , sum(is.na(FAA_U_RmvNA$duration)))
## [1] "Number of NA in DURATION: 0"
paste("Number of NA in NO_PASG:" , sum(is.na(FAA_U_RmvNA$no_pasg)))
## [1] "Number of NA in NO_PASG: 0"
paste("Number of NA in SPEED GROUND:" , sum(is.na(FAA_U_RmvNA$speed_ground)))
## [1] "Number of NA in SPEED GROUND: 0"
paste("Number of NA in SPEED AIR:" , sum(is.na(FAA_U_RmvNA$speed_air)))
## [1] "Number of NA in SPEED AIR: 588"
paste("Number of NA in HEIGHT:" , sum(is.na(FAA_U_RmvNA$height)))
## [1] "Number of NA in HEIGHT: 0"
paste("Number of NA in PITCH:" , sum(is.na(FAA_U_RmvNA$pitch)))
## [1] "Number of NA in PITCH: 0"
paste("Number of NA in DISTANCE:" , sum(is.na(FAA_U_RmvNA$distance)))
## [1] "Number of NA in DISTANCE: 0"
nrow(FAA_U_RmvNA)
## [1] 783
Conclusion: we removed close to 850 -783= 67 rows. * Abnormal values: 17 * NA VALUE Rows(entire row is missing): 50 * Speed Air: has 588 Missing values
##Use below formula to create 2 new columns
S1<-FAA_U_RmvNA%>%
filter(FAA_U_RmvNA$distance<1000)%>%
mutate(Y =1)
S2<-FAA_U_RmvNA%>%
filter(distance>=1000 & distance<2500)%>%
mutate(Y =2)
S3<-FAA_U_RmvNA%>%
filter(distance>=2500)%>%
mutate(Y =3)
FAAM<-rbind(S1,S2,S3)[,-8]
dim(FAAM)
## [1] 783 8
head(FAAM)
## # A tibble: 6 x 8
## aircraft duration no_pasg speed_ground speed_air height pitch Y
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 boeing 73.0 54 54.4 NA 24.0 3.84 1
## 2 boeing 52.9 57 57.1 NA 19.4 4.64 1
## 3 boeing 166. 69 48.8 NA 31.2 3.90 1
## 4 boeing 180. 66 63.7 NA 19.6 4.29 1
## 5 boeing 228. 78 61.2 NA 21.8 4.60 1
## 6 boeing 201. 71 41.5 NA 28.5 4.60 1
We have 7 predictors and one Dependent response. The respnse is multinomial with values 1, 2, 3 based on the actual landing distance.
FAAM$Y<-as.factor(FAAM$Y)
Analysis from previous project
Because we already know that speed air and speed ground have Multicollinearity choosing either one will be sensible idea. Here we choose speed ground on the basis that it has no missing values whereas speed ground many 588 of those.
Summary Statistics
summary(FAAM)
## aircraft duration no_pasg speed_ground
## Length:783 Min. : 41.95 Min. :29.00 Min. : 27.74
## Class :character 1st Qu.:119.67 1st Qu.:55.00 1st Qu.: 66.01
## Mode :character Median :154.28 Median :60.00 Median : 79.75
## Mean :154.83 Mean :60.07 Mean : 79.51
## 3rd Qu.:189.75 3rd Qu.:65.00 3rd Qu.: 92.13
## Max. :305.62 Max. :87.00 Max. :132.78
##
## speed_air height pitch Y
## Min. : 90.00 Min. : 6.228 Min. :2.284 1:245
## 1st Qu.: 96.15 1st Qu.:23.562 1st Qu.:3.654 2:438
## Median :100.89 Median :30.203 Median :4.017 3:100
## Mean :103.50 Mean :30.438 Mean :4.015
## 3rd Qu.:109.42 3rd Qu.:36.984 3rd Qu.:4.385
## Max. :132.91 Max. :59.946 Max. :5.927
## NA's :588
Removing Speed Air variable
FAAM_CL<-FAAM[,-5]
str(FAAM_CL)
## Classes 'tbl_df', 'tbl' and 'data.frame': 783 obs. of 7 variables:
## $ aircraft : chr "boeing" "boeing" "boeing" "boeing" ...
## $ duration : num 73 52.9 165.5 179.9 228.2 ...
## $ no_pasg : num 54 57 69 66 78 71 66 60 75 71 ...
## $ speed_ground: num 54.4 57.1 48.8 63.7 61.2 ...
## $ height : num 24 19.4 31.2 19.6 21.8 ...
## $ pitch : num 3.84 4.64 3.9 4.29 4.6 ...
## $ Y : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
Aircraft
nmod.s<-multinom(Y ~aircraft, family=cumulative(parallel=TRUE),FAAM_CL)
## # weights: 9 (4 variable)
## initial value 860.213422
## iter 10 value 725.056889
## final value 725.056861
## converged
int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("aircraft : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')
## aircraft : p2/p1 is significant
cat("aircraft : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))
## aircraft : p3/p1 is significant
Duration
nmod.s<-multinom(Y ~duration, family=cumulative(parallel=TRUE),FAAM_CL)
## # weights: 9 (4 variable)
## initial value 860.213422
## final value 742.088196
## converged
int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("duration : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')
## duration : p2/p1 is insignificant
cat("duration : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))
## duration : p3/p1 is insignificant
no pasg
nmod.s<-multinom(Y ~no_pasg, family=cumulative(parallel=TRUE),FAAM_CL)
## # weights: 9 (4 variable)
## initial value 860.213422
## final value 744.740361
## converged
int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("no pasg : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')
## no pasg : p2/p1 is insignificant
cat("no pasg : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))
## no pasg : p3/p1 is insignificant
speed ground
nmod.s<-multinom(Y ~speed_ground, family=cumulative(parallel=TRUE),FAAM_CL)
## # weights: 9 (4 variable)
## initial value 860.213422
## iter 10 value 362.508426
## iter 20 value 348.890071
## final value 348.889729
## converged
int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("speed_ground : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')
## speed_ground : p2/p1 is significant
cat("speed_ground : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))
## speed_ground : p3/p1 is significant
height
nmod.s<-multinom(Y ~height, family=cumulative(parallel=TRUE),FAAM_CL)
## # weights: 9 (4 variable)
## initial value 860.213422
## final value 732.380913
## converged
int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("height : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')
## height : p2/p1 is significant
cat("height : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))
## height : p3/p1 is significant
pitch
nmod.s<-multinom(Y ~pitch, family=cumulative(parallel=TRUE),FAAM_CL)
## # weights: 9 (4 variable)
## initial value 860.213422
## final value 743.211297
## converged
int.s<-summary(nmod.s)
p<-pnorm(int.s$coefficients/int.s$standard.errors,lower.tail = FALSE)
cat("pitch : p2/p1 is",ifelse(p[1,2]>0.05,"insignificant","significant"),'\n')
## pitch : p2/p1 is insignificant
cat("pitch : p3/p1 is",ifelse(p[2,2]>0.05,"insignificant","significant"))
## pitch : p3/p1 is significant
Observation
Single factor analysis tells us that Aircraft, speed ground and height are significant. Pitch is a little ambigious because its insignificant for first case and significant for the other.
Distribution of Y
FAAM_CL%>%
group_by(Y)%>%
summarise(count=n()) %>%
ggplot(aes(x=Y,y = count)) +geom_bar(stat = "identity",fill='#3990E5' ,alpha=0.5)+
labs(x = "Landing Category",
y = "count",
title = "Number observations per Category",
subtitle = "Segrated by types")
Ploting significant predictors AIRCRAFT
ggplot(FAAM_CL, aes(factor(aircraft), ..count..)) +
geom_bar(aes(fill = as.factor(Y), alpha=0.5), position = "dodge")+
labs(x = "Aircraft Category",
y = "count",
title = "Number observations per Category",
subtitle = "Segrated by types of aircraft")
Observation * For both air bus and boeing there are number of category 2 landings are highest. * Airbus has close to 30 landings for category 3
Height and speed ground
theme_set(theme_ridges())
ggplot(FAAM_CL, aes(x = speed_ground, y = Y)) +
geom_density_ridges(aes(fill = Y,alpha=0.5)) +
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))
## Picking joint bandwidth of 3.07
ggplot(FAAM_CL, aes(x = height, y = Y)) +
geom_density_ridges(aes(fill = Y,alpha=0.5)) +
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))
## Picking joint bandwidth of 2.97
Observation Height: For category 3 height has a wider distribution Speed ground: As the category increases the mean of distribution also increases. Although the variance does not change much
nmod<-multinom(Y ~.,FAAM_CL)
## # weights: 24 (14 variable)
## initial value 860.213422
## iter 10 value 535.322398
## iter 20 value 233.215870
## iter 30 value 215.490327
## iter 40 value 215.306686
## iter 50 value 215.095098
## final value 215.033762
## converged
int1<-summary(nmod)
int1
## Call:
## multinom(formula = Y ~ ., data = FAAM_CL)
##
## Coefficients:
## (Intercept) aircraftboeing duration no_pasg speed_ground
## 2 -17.77161 3.739129 -0.003020479 -0.02072378 0.2177972
## 3 -132.59919 8.718948 0.002345470 -0.01323350 1.1968791
## height pitch
## 2 0.1397391 -0.3375208
## 3 0.3741067 0.9454726
##
## Std. Errors:
## (Intercept) aircraftboeing duration no_pasg speed_ground
## 2 2.1293610 0.4003410 0.002665208 0.01719359 0.01780560
## 3 0.0357626 0.8685735 0.008056256 0.05805919 0.04043594
## height pitch
## 2 0.01706753 0.2664874
## 3 0.04868468 0.7621497
##
## Residual Deviance: 430.0675
## AIC: 458.0675
Significant Predictors
pnorm(int1$coefficients/int1$standard.errors,lower.tail = FALSE)
## (Intercept) aircraftboeing duration no_pasg speed_ground
## 2 1 4.823060e-21 0.8714557 0.8859602 1.049188e-34
## 3 1 5.175255e-24 0.3854735 0.5901501 7.608276e-193
## height pitch
## 2 1.334405e-16 0.8973427
## 3 7.692985e-15 0.1073890
Observation Here we observe that Aircraft, Speed ground,height are significant. Here there can we 2 ways of finding these significant predictors * The coefficients that are greater that 2 standard error * Calculate p-value of test statistic using the pnorm function.
In both ways we observe that Aircraft, Speed ground,height are significant Lets, confirm this using step AIC(backward)
nmodAIC<-step(nmod)
## Start: AIC=458.07
## Y ~ aircraft + duration + no_pasg + speed_ground + height + pitch
##
## trying - aircraft
## # weights: 21 (12 variable)
## initial value 860.213422
## iter 10 value 553.236136
## iter 20 value 307.585472
## iter 30 value 302.304626
## iter 40 value 302.277717
## final value 302.272056
## converged
## trying - duration
## # weights: 21 (12 variable)
## initial value 860.213422
## iter 10 value 442.447072
## iter 20 value 232.139392
## iter 30 value 217.315099
## iter 40 value 216.591666
## iter 50 value 215.924741
## iter 50 value 215.924741
## iter 50 value 215.924741
## final value 215.924741
## converged
## trying - no_pasg
## # weights: 21 (12 variable)
## initial value 860.213422
## iter 10 value 571.600465
## iter 20 value 233.289377
## iter 30 value 220.332912
## iter 40 value 218.044063
## final value 215.778181
## converged
## trying - speed_ground
## # weights: 21 (12 variable)
## initial value 860.213422
## iter 10 value 719.414827
## final value 707.891766
## converged
## trying - height
## # weights: 21 (12 variable)
## initial value 860.213422
## iter 10 value 501.973957
## iter 20 value 279.931051
## iter 30 value 272.823455
## iter 40 value 272.747843
## final value 272.723559
## converged
## trying - pitch
## # weights: 21 (12 variable)
## initial value 860.213422
## iter 10 value 537.131697
## iter 20 value 232.570572
## iter 30 value 218.874317
## iter 40 value 218.164157
## final value 217.133536
## converged
## Df AIC
## - no_pasg 12 455.5564
## - duration 12 455.8495
## <none> 14 458.0675
## - pitch 12 458.2671
## - height 12 569.4471
## - aircraft 12 628.5441
## - speed_ground 12 1439.7835
## # weights: 21 (12 variable)
## initial value 860.213422
## iter 10 value 571.600465
## iter 20 value 233.289377
## iter 30 value 220.332912
## iter 40 value 218.044063
## final value 215.778181
## converged
##
## Step: AIC=455.56
## Y ~ aircraft + duration + speed_ground + height + pitch
##
## trying - aircraft
## # weights: 18 (10 variable)
## initial value 860.213422
## iter 10 value 461.131666
## iter 20 value 304.481071
## iter 30 value 303.053308
## iter 40 value 302.957518
## final value 302.957485
## converged
## trying - duration
## # weights: 18 (10 variable)
## initial value 860.213422
## iter 10 value 454.092567
## iter 20 value 231.918994
## iter 30 value 221.564842
## iter 40 value 216.658288
## final value 216.655416
## converged
## trying - speed_ground
## # weights: 18 (10 variable)
## initial value 860.213422
## iter 10 value 711.480155
## final value 708.091684
## converged
## trying - height
## # weights: 18 (10 variable)
## initial value 860.213422
## iter 10 value 454.702852
## iter 20 value 276.512160
## iter 30 value 273.463355
## iter 40 value 273.372814
## iter 40 value 273.372812
## iter 40 value 273.372812
## final value 273.372812
## converged
## trying - pitch
## # weights: 18 (10 variable)
## initial value 860.213422
## iter 10 value 439.964575
## iter 20 value 229.865728
## iter 30 value 219.743502
## iter 40 value 217.891884
## final value 217.863332
## converged
## Df AIC
## - duration 10 453.3108
## <none> 12 455.5564
## - pitch 10 455.7267
## - height 10 566.7456
## - aircraft 10 625.9150
## - speed_ground 10 1436.1834
## # weights: 18 (10 variable)
## initial value 860.213422
## iter 10 value 454.092567
## iter 20 value 231.918994
## iter 30 value 221.564842
## iter 40 value 216.658288
## final value 216.655416
## converged
##
## Step: AIC=453.31
## Y ~ aircraft + speed_ground + height + pitch
##
## trying - aircraft
## # weights: 15 (8 variable)
## initial value 860.213422
## iter 10 value 357.926020
## iter 20 value 304.349610
## iter 30 value 304.090286
## final value 304.071791
## converged
## trying - speed_ground
## # weights: 15 (8 variable)
## initial value 860.213422
## iter 10 value 711.281659
## final value 710.666324
## converged
## trying - height
## # weights: 15 (8 variable)
## initial value 860.213422
## iter 10 value 330.985544
## iter 20 value 277.196122
## iter 30 value 275.212444
## final value 275.046753
## converged
## trying - pitch
## # weights: 15 (8 variable)
## initial value 860.213422
## iter 10 value 352.446469
## iter 20 value 233.419805
## iter 30 value 222.087361
## final value 218.834648
## converged
## Df AIC
## <none> 10 453.3108
## - pitch 8 453.6693
## - height 8 566.0935
## - aircraft 8 624.1436
## - speed_ground 8 1437.3326
AIC(nmodAIC)
## [1] 453.3108
AIC(nmod)
## [1] 458.0675
Observation Here we see that an additional variable pitch is considered significant. The AIC is further reduced to 453.3108 from 458.0675. Lets comapre the 2 models using Chisq test.
deviance(nmodAIC)-deviance(nmod)
## [1] 3.243309
nmod$edf-nmodAIC$edf
## [1] 4
pchisq(deviance(nmodAIC)-deviance(nmod),nmod$edf-nmodAIC$edf,lower=F)
## [1] 0.5179643
Interpretation Pvalue from chisq test suggests that both the models are same. As pvalue>0.05 hence we cannot reject the null hypothesis that there is no difference between the 2 models.
AIC model adds a predictor to the set of significant predictors but improves the prediction ability we here choose the model suggested with one less predictor inorder to find a simpler model.
int1<-summary(nmod)
a<-int1$coefficients
s<-int1$standard.errors
pv<-ifelse(pnorm(int1$coefficients/int1$standard.errors,lower.tail = FALSE)<0.05,"Sign","notSign")
d<-data.frame(cbind(round(a[1:7],3),round(a[2:7],3),exp(a[1:7]),exp(a[2:7]),round(s[1:7],3),round(s[2:7],3),pv[1,],pv[2,]))
## Warning in cbind(round(a[1:7], 3), round(a[2:7], 3), exp(a[1:7]),
## exp(a[2:7]), : number of rows of result is not a multiple of vector length
## (arg 2)
colnames(d)<-c("log(p2/p1)","log(p3/p1)","expB:P2/P1","expB:P3/P1","se:p2/p1","se:p3/p1","pvalue:p2/p1","pvalue:p3/p1")
d
## log(p2/p1) log(p3/p1) expB:P2/P1
## (Intercept) -17.772 -132.599 1.91376727762e-08
## aircraftboeing -132.599 3.739 2.5876401450654e-58
## duration 3.739 8.719 42.0613445885537
## no_pasg 8.719 -0.003 6117.74157538977
## speed_ground -0.003 0.002 0.996984077769753
## height 0.002 -0.021 1.00234822256344
## pitch -0.021 -132.599 0.979489482985982
## expB:P3/P1 se:p2/p1 se:p3/p1 pvalue:p2/p1
## (Intercept) 2.5876401450654e-58 2.129 0.036 notSign
## aircraftboeing 42.0613445885537 0.036 0.4 Sign
## duration 6117.74157538977 0.4 0.869 notSign
## no_pasg 0.996984077769753 0.869 0.003 notSign
## speed_ground 1.00234822256344 0.003 0.008 Sign
## height 0.979489482985982 0.008 0.017 Sign
## pitch 2.5876401450654e-58 0.017 0.036 notSign
## pvalue:p3/p1
## (Intercept) notSign
## aircraftboeing Sign
## duration notSign
## no_pasg notSign
## speed_ground Sign
## height Sign
## pitch notSign
Model Assessment
pred<-data.frame(predict(nmod, type="probs"))
pred$prediction<-ifelse(pred$X1>pred$X2,ifelse(pred$X1>pred$X3,'1','3'),ifelse(pred$X2>pred$X3,'2','3'))
pred$real<-FAAM_CL$Y
table(pred$real,pred$prediction,dnn=c('real','prediction'))
## prediction
## real 1 2 3
## 1 207 38 0
## 2 35 398 5
## 3 0 6 94
mcr<-mean(pred$real!=pred$prediction)
mcr
## [1] 0.1072797
Interpretation
In the above table ‘pvalue:px/p1’ column denotes whether that predictor is significant or not. “Sign” indicates that the predictor is significant and vice-versa.
Upon regressing the full model we see that only 3 predictors out of 6 are significant, that are Aircraft(Boeing), height, speed ground.
We see that intercept values are also not significant.
Lets consider the different landings as Normal , long , risky correspoding to Y = 1, 2, 3
AIRCRAFT: Here, we see that log odds(log(p2/p1)) linearly decrease by -132 when aircraft = boeing but log odds(p3/p1) linearly increase for by 3.7 when aircraft=boeing. This means that aircraft plays a less significant role in first category(p2/p1) but a more important role in second category(p3/p1). Simply stated using boeing aircraft increases odds of landing to be risky
Speed ground : We observe that per unit increase in value of speed ground increase the odds of landing to be long as well as risky.
Height : We observe that per unit increase in value of Height increase the odds(chances) of landing to be long as well as risky.
In a simple way It can be interpreted as if the aircraft is boeing the per unit increase in speed ground and height increases the probility for the Landing distance to be greater than 2500 units.
Incase of ordinal data we may need to use another modelling mechanism.. The function that can be used in vglm from VGAM package.
Thank-you !