Motivation: To reduce the risk of landing overrun.
Goal: To study what factors and how they would impact the landing distance of a commercial flight.
library(tidyverse) #to visualize, transform, input, tidy and join data
library(dplyr) #data wrangling
library(stringr) #string related functions
library(kableExtra) #to create HTML Table
library(DT) #to preview the data sets
library(lubridate) #to apply the date functions
library(xlsx) #to load excel files
library(ROCR) #to use ROC curves
library(faraway) #to use the ilogit function
library(nnet) #to implement multinomial functionData has following columns -
text_tbl <- data.frame (
Variable = c("Aircraft", "Duration", "No_pasg", "speed_ground","speed_air","height", "pitch","distance"),
Description = c(
"make of an aircraft ",
"Duration of flight",
"no. of passengers ",
"ground speed ",
"air speed ",
"height ",
"pitch angle ",
"flight duration between take-off and landing "
)
)
kable(text_tbl) %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T, border_right = T) %>%
column_spec(2, width = "30em")| Variable | Description |
|---|---|
| Aircraft | make of an aircraft |
| Duration | Duration of flight |
| No_pasg | no. of passengers |
| speed_ground | ground speed |
| speed_air | air speed |
| height | height |
| pitch | pitch angle |
| distance | flight duration between take-off and landing |
I load the two datasets-
faa1 <- read.xlsx("FAA1.xls", sheetName = "FAA1")
faa2 <- read.xlsx("FAA2_2.xls", sheetName = "Sheet1")Data has 800 observations and 8 in FAA1. Data has 150 observations and 7 in FAA2.
FAA2 doesn’t contain information about the duration of flights
Merge the two data sets. Are there any duplications?
faa <- bind_rows(faa1, faa2)
str(faa)## 'data.frame': 950 obs. of 8 variables:
## $ aircraft : Factor w/ 2 levels "airbus","boeing": 2 2 2 2 2 2 2 2 2 2 ...
## $ duration : num 98.5 125.7 112 196.8 90.1 ...
## $ no_pasg : num 53 69 61 56 70 55 54 57 61 56 ...
## $ speed_ground: num 107.9 101.7 71.1 85.8 59.9 ...
## $ speed_air : num 109 103 NA NA NA ...
## $ height : num 27.4 27.8 18.6 30.7 32.4 ...
## $ pitch : num 4.04 4.12 4.43 3.88 4.03 ...
## $ distance : num 3370 2988 1145 1664 1050 ...
#add the duplicate removal code
faa %>%
select(-duration) %>%
duplicated() %>%
sum() ## [1] 100
There are 100 duplicated in total, which I have removed.
check <- faa %>%
select(-duration) %>%
duplicated() %>%
which()
faa <- faa[-check,]Key findings-
I observed that few of the variables have incorrect data, which may be because of the issue with data capture or wrong data entry. For example-
height has negative value as the minimum value
the minimum distance for an observation is 34 which is too small.
air_speed is not captured for 75% of the data.
The minium duration of flight is 15 minutes, which doesn’t seem right
Data had duplicate records(100) after merging the two data-sets
Removing abnormal values in the data set.
faa_check <- faa %>%
filter((duration > 40| is.na(duration)) & (speed_ground >= 30) & (speed_ground <= 140) &
(height >= 6) & (distance < 6000))
dim(faa_check)## [1] 831 8
faa <- faa_checkA total of 19 observations seem abnormal which we remove.
Data has 831 observations and 8
We observe that Duration is null for 50 observations, which we need to look at. We will replace the value with mean of the overall column
faa$duration_corrected <- NA
faa <- transform(faa, duration_corrected = ifelse(is.na(faa$duration), mean(faa$duration, na.rm=TRUE), faa$duration))Plotting histogram for all the variables.
#hist(faa$duration_impute, main = "Histogram of Duration", xlab = "Duration")
hist(faa$speed_ground, main = "Histogram of Ground Speed", xlab = "Ground Speed")hist(faa$height, main = "Histogram of Height", xlab = "Height")hist(faa$pitch, main = "Histogram of Pitch", xlab = "Pitch")hist(faa$no_pasg, main = "Histogram of No. of Passengers", xlab = "No. of Passengers")hist(faa$speed_air, main = "Histogram of Air Speed", xlab = "Air Speed")hist(faa$distance, main = "Histogram of Landing Distance", xlab = "Landing Distance")hist(faa$duration_corrected, main = "Histogram of Duration of flight", xlab = "Flight Duration in mins")Key findings:
After cleaning the data, I observed that - 1. There were total 19 abnormal values in the data
Duration has 50 NA values, which we corrected based on the mean of the overall sample
Speed of the air is right-skewed whereas all the other variables seem to be noramlly distributed
Min speed of air is 90 MPH
response variable “landing distance”
Pairwise Correlation
cor_duration <- cor(faa$distance, faa$duration_corrected)
cor_speed_ground <- cor(faa$distance,faa$speed_ground)
cor_height <- cor(faa$distance,faa$height)
cor_pitch <- cor(faa$distance,faa$pitch)
cor_no_pasg <- cor(faa$distance,faa$no_pasg)
cor_speed_air <- cor.test(faa$distance,faa$speed_air,method="pearson")$estimate
cor_aircraft <- cor(faa$distance,as.numeric(faa$aircraft ))
variable_names <- c("Duration","Ground Speed","Height","Pitch",
"No. of Passengers","Air Speed","Aircraft")
correlation <- c(cor_duration,cor_speed_ground,cor_height,cor_pitch,cor_no_pasg,cor_speed_air,cor_aircraft)
table_1 <- data.frame(variable_names,correlation)
table_1$direction <- ifelse(table_1$correlation > 0, "Positive","Negative")
table_1 <- table_1 %>% arrange(desc(correlation))Show X-Y scatter plots
faa <- faa[-2]
GGally::ggpairs(
data = faa
)We see that both air_speed and ground_speed are closely related to the landing distance. Since they are highly correlated, when we regress both the variables together, we see that ground_speed becomes insignificant as the variability due to speed_ground is explained by speed_Air and hence it is not contributing in explaining variation any more.
cor.test(faa$speed_ground, faa$speed_air, method = "pearson")$estimate## cor
## 0.9879383
We observe 98% correlation. I would like to keep speed_air in my method because since R^2 and adj. R^2 is more when speed_air is considered.(comparison of the two models- 1 and 2.) Thus, speed of air is a significant contributor according to me.
model0 <- lm(distance ~ 1,data=faa)
model1 <- lm(distance ~ speed_air,data=faa)
model2 <- lm(distance ~ speed_air + aircraft,data=faa)
model3 <- lm(distance ~ speed_air + aircraft + height ,data=faa)
model4 <- lm(distance ~ speed_air + aircraft + height + no_pasg ,data=faa)
model5 <- lm(distance ~ speed_air + aircraft + height + no_pasg + duration_corrected ,data=faa)
model6 <- lm(distance ~ speed_air + aircraft + height + no_pasg + duration_corrected + pitch,data=faa)
model0.rsqr <- summary(model0)$r.squared
model1.rsqr <- summary(model1)$r.squared
model2.rsqr <- summary(model2)$r.squared
model3.rsqr <- summary(model3)$r.squared
model4.rsqr <- summary(model4)$r.squared
model5.rsqr <- summary(model5)$r.squared
model6.rsqr <- summary(model6)$r.squared
rsquare <- cbind(c(model0.rsqr, model1.rsqr, model2.rsqr, model3.rsqr, model4.rsqr, model5.rsqr, model6.rsqr), 0:6)
colnames(rsquare) <- c("rsquare","variables")
rsquare <- as.data.frame(rsquare)
rsquare %>%
ggplot(aes(x = variables, y = rsquare)) +
geom_line() +
xlab("no. of variables") +
ylab("R-square") +
theme_classic()With increase in variables, R^2 also increases.
model0.rsqr <- summary(model0)$adj.r.squared
model1.rsqr <- summary(model1)$adj.r.squared
model2.rsqr <- summary(model2)$adj.r.squared
model3.rsqr <- summary(model3)$adj.r.squared
model4.rsqr <- summary(model4)$adj.r.squared
model5.rsqr <- summary(model5)$adj.r.squared
model6.rsqr <- summary(model6)$adj.r.squared
rsquare <- cbind(c(model0.rsqr, model1.rsqr, model2.rsqr, model3.rsqr, model4.rsqr, model5.rsqr, model6.rsqr), 0:6)
colnames(rsquare) <- c("adj_rsquare","variables")
rsquare <- as.data.frame(rsquare)
rsquare %>%
ggplot(aes(x = variables, y = adj_rsquare)) +
geom_line() +
xlab("no. of variables") +
ylab("Adjusted R-square") +
theme_classic()We see that Adjusted R^2 increases initially but then it slowly starts declining after 3 variables have been added.
model0_AIC <- AIC(model0)
model1_AIC <- AIC(model1)
model2_AIC <- AIC(model2)
model3_AIC <- AIC(model3)
model4_AIC <- AIC(model4)
model5_AIC <- AIC(model5)
model6_AIC <- AIC(model6)
AIC <- cbind(c(model0_AIC, model1_AIC, model2_AIC, model3_AIC, model4_AIC, model5_AIC, model6_AIC),0:6)
colnames(AIC) <- c("AIC","variables")
AIC <- as.data.frame(AIC)
AIC %>%
ggplot(aes(x = variables, y = AIC)) +
geom_line() +
xlab("no. of variables")+
ylab("AIC") +
theme_classic()since smaller the AIC, better is the model. Hence, we see that it decreases. However, after addition of 3 variables, the decrease in AIC isn’t much and infact it starts going up.
According to me, the significant model is -
model4 <- lm(distance ~ speed_air + aircraft + height ,data=faa)
summary(model4)##
## Call:
## lm(formula = distance ~ speed_air + aircraft + height, data = faa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -300.74 -94.78 15.47 97.09 330.41
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6390.376 109.839 -58.18 <2e-16 ***
## speed_air 82.148 0.976 84.17 <2e-16 ***
## aircraftboeing 427.442 19.173 22.29 <2e-16 ***
## height 13.702 1.007 13.60 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 134.2 on 199 degrees of freedom
## (628 observations deleted due to missingness)
## Multiple R-squared: 0.9737, Adjusted R-squared: 0.9733
## F-statistic: 2458 on 3 and 199 DF, p-value: < 2.2e-16
The R^2 and Adjusted R^2 are also close and around 97% which is quite right.
faaNoNA <- na.exclude(faa)
#nrow(faaNoNA)
model01 <- lm(distance ~ 1,data=faaNoNA)
model11 <- lm(distance ~ speed_air,data=faaNoNA)
model21 <- lm(distance ~ speed_air + aircraft,data=faaNoNA)
model31 <- lm(distance ~ speed_air + aircraft + height ,data=faaNoNA)
model41 <- lm(distance ~ speed_air + aircraft + height + no_pasg ,data=faaNoNA)
model51 <- lm(distance ~ speed_air + aircraft + height + no_pasg + duration_corrected ,data=faaNoNA)
model61 <- lm(distance ~ speed_air + aircraft + height + no_pasg + duration_corrected + pitch,data=faaNoNA)
MASS::stepAIC(model01,direction="forward",scope=list(upper=model61,lower=model01))## Start: AIC=2725.93
## distance ~ 1
##
## Df Sum of Sq RSS AIC
## + speed_air 1 121121738 15346230 2284.3
## + aircraft 1 4417704 132050265 2721.2
## <none> 136467968 2725.9
## + height 1 633675 135834293 2727.0
## + duration_corrected 1 353842 136114126 2727.4
## + pitch 1 244706 136223262 2727.6
## + no_pasg 1 217724 136250245 2727.6
##
## Step: AIC=2284.33
## distance ~ speed_air
##
## Df Sum of Sq RSS AIC
## + aircraft 1 8424832 6921399 2124.7
## + height 1 2803377 12542854 2245.4
## + pitch 1 860427 14485804 2274.6
## + no_pasg 1 159095 15187136 2284.2
## <none> 15346230 2284.3
## + duration_corrected 1 11938 15334292 2286.2
##
## Step: AIC=2124.7
## distance ~ speed_air + aircraft
##
## Df Sum of Sq RSS AIC
## + height 1 3335147 3586252 1993.2
## <none> 6921399 2124.7
## + duration_corrected 1 61095 6860304 2124.9
## + no_pasg 1 56195 6865204 2125.0
## + pitch 1 2584 6918815 2126.6
##
## Step: AIC=1993.22
## distance ~ speed_air + aircraft + height
##
## Df Sum of Sq RSS AIC
## + no_pasg 1 53331 3532921 1992.2
## <none> 3586252 1993.2
## + duration_corrected 1 12912 3573340 1994.5
## + pitch 1 174 3586078 1995.2
##
## Step: AIC=1992.18
## distance ~ speed_air + aircraft + height + no_pasg
##
## Df Sum of Sq RSS AIC
## <none> 3532921 1992.2
## + duration_corrected 1 9532.8 3523389 1993.6
## + pitch 1 333.8 3532588 1994.2
##
## Call:
## lm(formula = distance ~ speed_air + aircraft + height + no_pasg,
## data = faaNoNA)
##
## Coefficients:
## (Intercept) speed_air aircraftboeing height
## -6248.753 82.131 425.592 13.696
## no_pasg
## -2.316
I observe that number of passengers is also considered a signicant contributor by automatic selection of variables.
so the final model as obtained here is - model4 i.e.
Distance = - 6263.754 + 82.032( speed_air) + 432.074 (aircraft) + 13.776(height) - 2.041(no_pasg)
From this, we can also conclude that there is no single criteria for selecting a model. Also there is no “best model”. Since, we have chosen AIC as our selection criteria, we see no_pasg also included in our model. However, if I try to include this variable using P_value method, no_pasg will turn out to be non-significant.
Segregating long but non-risky landing distance from risky distance.
faa <- faa %>%
mutate(long.landing = as.factor(ifelse(distance > 2500, 1,0 )) ,
risky.landing = as.factor(ifelse(distance > 3000,1,0 )),
aircraft = as.factor(aircraft))
faa$duration <- NULL
faa$distance <- NULLHistogram to show distribution of “long.landing”
faa %>%
ggplot(aes(long.landing)) +
geom_bar()only 12% aircrafts have long landing
round(prop.table(table(faa$long.landing)),2)##
## 0 1
## 0.88 0.12
mdl_duration <- glm (faa$long.landing ~ faa$duration_corrected, family = "binomial")
mdl_speedgrnd <- glm (faa$long.landing ~ faa$speed_ground, family = "binomial")
mdl_height <- glm (faa$long.landing ~ faa$height, family = "binomial")
mdl_pitch <- glm (faa$long.landing ~ faa$pitch, family = "binomial")
mdl_nopasg <- glm (faa$long.landing ~ faa$no_pasg, family = "binomial")
mdl_speedair <- glm (faa$long.landing ~ faa$speed_air, family = "binomial")
mdl_aircraft <- glm (faa$long.landing ~ faa$aircraft, family = "binomial")
duration <- summary(mdl_duration)$coef[2,c(1,4)]
speed_ground <- summary(mdl_speedgrnd)$coef[2,c(1,4)]
height <- summary(mdl_height)$coef[2,c(1,4)]
pitch <- summary(mdl_pitch)$coef[2,c(1,4)]
no_pasg <- summary(mdl_nopasg)$coef[2,c(1,4)]
speed_air <- summary(mdl_speedair)$coef[2,c(1,4)]
aircraft_boeing <- summary(mdl_aircraft)$coef[2,c(1,4)]
aircraft_airbus <- summary(mdl_aircraft)$coef[1,c(1,4)]
coefficients <- c(duration[1], speed_ground[1], height[1], pitch[1], no_pasg[1],speed_air[1],aircraft_boeing[1],aircraft_airbus[1])
coefficients <- round(coefficients, digits = 3)
odds_ratio <- round(exp(coefficients), 3)
p_value <- c(duration[2], speed_ground[2], height[2], pitch[2], no_pasg[2],speed_air[2],aircraft_boeing[2],aircraft_airbus[2])
p_value <- round(p_value, digits = 3)
variable_names <- c("Duration","Ground Speed","Height","Pitch","No. of Passengers","Air Speed","Aircraft-Boeing", "Aircraft-Airbus")
table_2 <- data.frame(variable_names, coefficients,odds_ratio, p_value)
table_2$slope_direction <- ifelse(coefficients > 0 , "Positive", "Negative")
table_2 <- table_2 %>%
select(variable_names, coefficients, odds_ratio, p_value, slope_direction) %>%
arrange(p_value)
table_2## variable_names coefficients odds_ratio p_value slope_direction
## 1 Ground Speed 0.472 1.603 0.000 Positive
## 2 Air Speed 0.512 1.669 0.000 Positive
## 3 Aircraft-Boeing 0.864 2.373 0.000 Positive
## 4 Aircraft-Airbus -2.428 0.088 0.000 Negative
## 5 Pitch 0.401 1.493 0.047 Positive
## 6 Height 0.009 1.009 0.422 Positive
## 7 No. of Passengers -0.007 0.993 0.606 Negative
## 8 Duration -0.001 0.999 0.626 Negative
we see speed_ground, speed_air, aircraft type, pitch and height appear to be positively correlated to long_landing.
Let’s visualize it -
check_plot <- function(x) {
ggplot(aes(x = x, fill = long.landing), data = faa) +
geom_histogram(position = 'dodge', aes(y = ..density..))
}The probability of long landing increases with the increase in speed_ground
check_plot(faa$speed_ground)Probability of long landing increases with increase in speed of air
check_plot(faa$speed_air)Long landing isn’t affected by pitch of aircraft
check_plot(faa$pitch)Long landing seem to be unaffected by height of aircraft
check_plot(faa$height)I observed that speed of ground, aircraft and height are significant. Pitch is not significant like we observed in the previous table
full_model <- glm(long.landing ~ aircraft +
no_pasg + speed_ground + height +
pitch + duration_corrected, family = "binomial",
data = faa)
summary(full_model)##
## Call:
## glm(formula = long.landing ~ aircraft + no_pasg + speed_ground +
## height + pitch + duration_corrected, family = "binomial",
## data = faa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.16087 -0.00052 0.00000 0.00000 2.32238
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.215e+02 2.500e+01 -4.858 1.19e-06 ***
## aircraftboeing 5.192e+00 1.200e+00 4.328 1.51e-05 ***
## no_pasg 3.423e-03 5.461e-02 0.063 0.950023
## speed_ground 1.033e+00 2.082e-01 4.960 7.03e-07 ***
## height 2.531e-01 7.253e-02 3.490 0.000483 ***
## pitch 1.484e+00 8.454e-01 1.755 0.079247 .
## duration_corrected 5.287e-03 7.864e-03 0.672 0.501355
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 622.778 on 830 degrees of freedom
## Residual deviance: 52.746 on 824 degrees of freedom
## AIC: 66.746
##
## Number of Fisher Scoring iterations: 12
Based on AIC model, results seem to be consistent with the table observed.
faa_clean <- select(faa, -risky.landing , -speed_air)
model01 <- glm(long.landing ~ 1,data=faa_clean, family = "binomial")
model61 <- glm(long.landing ~ speed_ground + aircraft + height + no_pasg + duration_corrected + pitch,data=faa_clean, family = "binomial")
model_1 <- MASS::stepAIC(model01,direction="forward",scope=list(upper=model61,lower=model01))## Start: AIC=624.78
## long.landing ~ 1
##
## Df Deviance AIC
## + speed_ground 1 115.47 119.47
## + aircraft 1 606.55 610.55
## + pitch 1 618.79 622.79
## <none> 622.78 624.78
## + height 1 622.13 626.13
## + no_pasg 1 622.51 626.51
## + duration_corrected 1 622.54 626.54
##
## Step: AIC=119.47
## long.landing ~ speed_ground
##
## Df Deviance AIC
## + aircraft 1 84.665 90.665
## + height 1 100.459 106.459
## + pitch 1 105.527 111.527
## <none> 115.470 119.470
## + duration_corrected 1 115.378 121.378
## + no_pasg 1 115.468 121.468
##
## Step: AIC=90.66
## long.landing ~ speed_ground + aircraft
##
## Df Deviance AIC
## + height 1 57.047 65.047
## + pitch 1 81.309 89.309
## <none> 84.665 90.665
## + duration_corrected 1 83.164 91.164
## + no_pasg 1 84.219 92.219
##
## Step: AIC=65.05
## long.landing ~ speed_ground + aircraft + height
##
## Df Deviance AIC
## + pitch 1 53.204 63.204
## <none> 57.047 65.047
## + duration_corrected 1 56.288 66.288
## + no_pasg 1 57.031 67.031
##
## Step: AIC=63.2
## long.landing ~ speed_ground + aircraft + height + pitch
##
## Df Deviance AIC
## <none> 53.204 63.204
## + duration_corrected 1 52.750 64.750
## + no_pasg 1 53.204 65.204
summary(model_1)##
## Call:
## glm(formula = long.landing ~ speed_ground + aircraft + height +
## pitch, family = "binomial", data = faa_clean)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.20284 -0.00054 0.00000 0.00000 2.35719
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -119.77598 24.41821 -4.905 9.33e-07 ***
## speed_ground 1.02266 0.20290 5.040 4.65e-07 ***
## aircraftboeing 5.13443 1.18091 4.348 1.37e-05 ***
## height 0.25795 0.06861 3.760 0.00017 ***
## pitch 1.53751 0.84109 1.828 0.06755 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 622.778 on 830 degrees of freedom
## Residual deviance: 53.204 on 826 degrees of freedom
## AIC: 63.204
##
## Number of Fisher Scoring iterations: 12
We further use BIC measure to find “best subsets” model. Pitch is not chosen by this model. This may be because BIC penalizes strongly
model_2 <- MASS::stepAIC(model01,direction="forward",scope=list(upper=model61,lower=model01), k = log(nrow(faa_clean)))## Start: AIC=629.5
## long.landing ~ 1
##
## Df Deviance AIC
## + speed_ground 1 115.47 128.92
## + aircraft 1 606.55 620.00
## <none> 622.78 629.50
## + pitch 1 618.79 632.24
## + height 1 622.13 635.58
## + no_pasg 1 622.51 635.96
## + duration_corrected 1 622.54 635.98
##
## Step: AIC=128.92
## long.landing ~ speed_ground
##
## Df Deviance AIC
## + aircraft 1 84.665 104.83
## + height 1 100.459 120.63
## + pitch 1 105.527 125.69
## <none> 115.470 128.92
## + duration_corrected 1 115.378 135.54
## + no_pasg 1 115.468 135.64
##
## Step: AIC=104.83
## long.landing ~ speed_ground + aircraft
##
## Df Deviance AIC
## + height 1 57.047 83.937
## <none> 84.665 104.832
## + pitch 1 81.309 108.200
## + duration_corrected 1 83.164 110.054
## + no_pasg 1 84.219 111.110
##
## Step: AIC=83.94
## long.landing ~ speed_ground + aircraft + height
##
## Df Deviance AIC
## <none> 57.047 83.937
## + pitch 1 53.204 86.817
## + duration_corrected 1 56.288 89.901
## + no_pasg 1 57.031 90.644
summary(model_2)##
## Call:
## glm(formula = long.landing ~ speed_ground + aircraft + height,
## family = "binomial", data = faa_clean)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.43442 -0.00117 0.00000 0.00000 2.57435
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -102.95437 19.22882 -5.354 8.59e-08 ***
## speed_ground 0.92657 0.17242 5.374 7.70e-08 ***
## aircraftboeing 5.04813 1.11520 4.527 5.99e-06 ***
## height 0.23106 0.05959 3.877 0.000106 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 622.778 on 830 degrees of freedom
## Residual deviance: 57.047 on 827 degrees of freedom
## AIC: 65.047
##
## Number of Fisher Scoring iterations: 11
The variables and their contribution in prediction of long landing is -
table_2## variable_names coefficients odds_ratio p_value slope_direction
## 1 Ground Speed 0.472 1.603 0.000 Positive
## 2 Air Speed 0.512 1.669 0.000 Positive
## 3 Aircraft-Boeing 0.864 2.373 0.000 Positive
## 4 Aircraft-Airbus -2.428 0.088 0.000 Negative
## 5 Pitch 0.401 1.493 0.047 Positive
## 6 Height 0.009 1.009 0.422 Positive
## 7 No. of Passengers -0.007 0.993 0.606 Negative
## 8 Duration -0.001 0.999 0.626 Negative
The various plots that helped us understand the relationship between the variables better-
The probability of long landing increases with the increase in speed_ground
check_plot(faa$speed_ground)Probability of long landing increases with increase in speed of air
check_plot(faa$speed_air)Long landing isn’t affected by pitch of aircraft
check_plot(faa$pitch)Long landing seem to be unaffected by height of aircraft
check_plot(faa$height)Based on our analysis, our final model is: long.landing ~ speed_ground + aircraft + height
Repeating all the steps for risky landing—
faa %>%
ggplot(aes(risky.landing)) +
geom_bar()only 7% aircrafts have long landing
round(prop.table(table(faa$risky.landing)),2)##
## 0 1
## 0.93 0.07
The speed of ground, air speed and aircraft make seems to be likely to affect the risky landing.
## variable_names coefficients odds_ratio p_value slope_direction
## 1 Ground Speed 0.614 1.848 0.000 Positive
## 2 Air Speed 0.870 2.387 0.000 Positive
## 3 Aircraft-Boeing 1.002 2.724 0.000 Positive
## 4 Aircraft-Airbus -3.108 0.045 0.000 Negative
## 5 Pitch 0.371 1.449 0.143 Positive
## 6 No. of Passengers -0.025 0.975 0.154 Negative
## 7 Duration -0.001 0.999 0.674 Negative
## 8 Height -0.002 0.998 0.871 Negative
we see speed_ground, speed_air, aircraft type, pitch and height appear to be positively correlated to long_landing.
Let’s visualize it -
check_plot_risky <- function(x) {
ggplot(aes(x = x, fill = risky.landing), data = faa) +
geom_histogram(position = 'dodge', aes(y = ..density..))
}The probability of riksy landing increases with the increase in speed_ground
check_plot_risky(faa$speed_ground)Probability of risky landing increases with increase in speed of air
check_plot_risky(faa$speed_air)Risky landing isn’t affected by pitch of aircraft
check_plot_risky(faa$pitch)Risky landing seem to be unaffected by height of aircraft
check_plot_risky(faa$height)I observed that speed of ground and make of aircraft are significant. Results seem to be consistent with our observations before
full_model <- glm(risky.landing ~ aircraft +
no_pasg + speed_ground + height +
pitch + duration_corrected, family = "binomial",
data = faa)
summary(full_model)##
## Call:
## glm(formula = risky.landing ~ aircraft + no_pasg + speed_ground +
## height + pitch + duration_corrected, family = "binomial",
## data = faa)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.46375 -0.00009 0.00000 0.00000 1.85765
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.035e+02 2.801e+01 -3.697 0.000218 ***
## aircraftboeing 4.457e+00 1.547e+00 2.881 0.003970 **
## no_pasg -8.620e-02 6.035e-02 -1.428 0.153201
## speed_ground 9.488e-01 2.465e-01 3.848 0.000119 ***
## height 4.310e-02 4.624e-02 0.932 0.351382
## pitch 6.139e-01 7.982e-01 0.769 0.441878
## duration_corrected 7.952e-04 1.224e-02 0.065 0.948209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 436.043 on 830 degrees of freedom
## Residual deviance: 36.473 on 824 degrees of freedom
## AIC: 50.473
##
## Number of Fisher Scoring iterations: 12
Results seem to be inconsistent with our observation before. Based on AIC measure, number of passengers seems to be a significant variable whereas if we look at the p_value, it doesn’t support our hypothesis.
faa_clean2 <- select(faa, -long.landing , -speed_air)
model_null2 <- glm(risky.landing ~ 1,data=faa_clean2, family = "binomial")
model_full2 <- glm(risky.landing ~ speed_ground + aircraft + height + no_pasg + duration_corrected + pitch,data=faa_clean2, family = "binomial")
model_3 <- MASS::stepAIC(model_null2,direction="forward",scope=list(upper=model_full2,lower=model_null2))## Start: AIC=438.04
## risky.landing ~ 1
##
## Df Deviance AIC
## + speed_ground 1 58.93 62.93
## + aircraft 1 422.74 426.74
## + pitch 1 433.89 437.89
## + no_pasg 1 434.00 438.00
## <none> 436.04 438.04
## + duration_corrected 1 435.86 439.86
## + height 1 436.02 440.02
##
## Step: AIC=62.93
## risky.landing ~ speed_ground
##
## Df Deviance AIC
## + aircraft 1 40.097 46.097
## + pitch 1 53.079 59.079
## <none> 58.931 62.931
## + no_pasg 1 58.318 64.318
## + height 1 58.667 64.667
## + duration_corrected 1 58.883 64.883
##
## Step: AIC=46.1
## risky.landing ~ speed_ground + aircraft
##
## Df Deviance AIC
## + no_pasg 1 37.707 45.707
## <none> 40.097 46.097
## + height 1 39.402 47.402
## + duration_corrected 1 39.884 47.884
## + pitch 1 39.928 47.928
##
## Step: AIC=45.71
## risky.landing ~ speed_ground + aircraft + no_pasg
##
## Df Deviance AIC
## <none> 37.707 45.707
## + height 1 37.099 47.099
## + pitch 1 37.449 47.449
## + duration_corrected 1 37.693 47.693
summary(model_3)##
## Call:
## glm(formula = risky.landing ~ speed_ground + aircraft + no_pasg,
## family = "binomial", data = faa_clean2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.33913 -0.00009 0.00000 0.00000 1.87810
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -99.90780 25.57993 -3.906 9.39e-05 ***
## speed_ground 0.94963 0.23559 4.031 5.56e-05 ***
## aircraftboeing 4.64188 1.47520 3.147 0.00165 **
## no_pasg -0.08462 0.05732 -1.476 0.13987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 436.043 on 830 degrees of freedom
## Residual deviance: 37.707 on 827 degrees of freedom
## AIC: 45.707
##
## Number of Fisher Scoring iterations: 12
Number of passengers, which seemed to be significant when considering AIC as the method for variable selection seem to be not significant when considering BIC. This may be because BIC penalizes strongly
model_4 <- MASS::stepAIC(model_null2,direction="forward",scope=list(upper=model_full2,lower=model_null2), k = log(nrow(faa_clean2)))## Start: AIC=442.77
## risky.landing ~ 1
##
## Df Deviance AIC
## + speed_ground 1 58.93 72.38
## + aircraft 1 422.74 436.18
## <none> 436.04 442.77
## + pitch 1 433.89 447.34
## + no_pasg 1 434.00 447.45
## + duration_corrected 1 435.86 449.31
## + height 1 436.02 449.46
##
## Step: AIC=72.38
## risky.landing ~ speed_ground
##
## Df Deviance AIC
## + aircraft 1 40.097 60.264
## <none> 58.931 72.376
## + pitch 1 53.079 73.247
## + no_pasg 1 58.318 78.486
## + height 1 58.667 78.835
## + duration_corrected 1 58.883 79.051
##
## Step: AIC=60.26
## risky.landing ~ speed_ground + aircraft
##
## Df Deviance AIC
## <none> 40.097 60.264
## + no_pasg 1 37.707 64.597
## + height 1 39.402 66.292
## + duration_corrected 1 39.884 66.775
## + pitch 1 39.928 66.819
summary(model_4)##
## Call:
## glm(formula = risky.landing ~ speed_ground + aircraft, family = "binomial",
## data = faa_clean2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.24398 -0.00011 0.00000 0.00000 1.61021
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -102.0772 24.7751 -4.120 3.79e-05 ***
## speed_ground 0.9263 0.2248 4.121 3.78e-05 ***
## aircraftboeing 4.0190 1.2494 3.217 0.0013 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 436.043 on 830 degrees of freedom
## Residual deviance: 40.097 on 828 degrees of freedom
## AIC: 46.097
##
## Number of Fisher Scoring iterations: 12
Conclusions for both the type of landings -
Speed of ground, height and type of aircraft are significant predictors of long landing.
Only speed of ground and type of aircraft are significant predictors of risky landing and height of aircraft seems to be not playing a role when landing is risky.
BIC for long landing is 65.047 and for risky landing is 46.097.
AIC for long landing is 63.204 and for long landing is 45.707
ROC Curves:
Plot for risky landing is marginally smoother than that for long landing.
data1 <- select(faa, -risky.landing, -speed_air)
data2 <- select(faa, -long.landing, -speed_air)
pred1 <- prediction(predict(model_2), data1$long.landing)
roc1 <- performance(pred1, "tpr", "fpr", main = "ROC for long landing and risky landing")
plot(roc1)
pred2 <- prediction(predict(model_4), data2$risky.landing)
roc2 <- performance(pred2, "tpr", "fpr")
plot(roc2, add = TRUE, colorize = TRUE, main = "ROC for long landing and risky landing")The AUC in case of long landing is 99.6% and that for risky landing is 99.9%.
#long landing
auc_ROCR1 <- performance(pred1, measure = "auc")
auc_ROCR1@y.values[[1]]## [1] 0.998333
#risky landing
auc_ROCR2 <- performance(pred2, measure = "auc")
auc_ROCR2@y.values[[1]] ## [1] 0.9986161
Given few parameters,
The long landing probability -
new.ind <- data.frame(aircraft = "boeing",
duration_corrected = 200,
no_pasg = 80,
speed_ground = 115,
speed_air = 120,
height = 40,
pitch = 4)
pred1 <- predict(model_2,newdata=new.ind, type = "link", se = T)
fit <- ilogit(pred1$fit)
upper <- ilogit(pred1$fit + (1.96 * pred1$se.fit))
lower <- ilogit(pred1$fit - (1.96 * pred1$se.fit))
cat("The confidence interval for long landing-",lower,"||", fit, "||", upper)## The confidence interval for long landing- 0.999985 || 1 || 1
The risky landing probability -
new.ind <- data.frame(aircraft = "boeing",
duration_corrected = 200,
no_pasg = 80,
speed_ground = 115,
speed_air = 120,
height = 40,
pitch = 4)
pred1 <- predict(model_4,newdata=new.ind,type = "link", se = T)
fit <- ilogit(pred1$fit)
upper <- ilogit(pred1$fit + (1.96 * pred1$se.fit))
lower <- ilogit(pred1$fit - (1.96 * pred1$se.fit))
cat("The confidence interval for risky landing-",lower,"||", fit, "||", upper)## The confidence interval for risky landing- 0.9874843 || 0.999789 || 0.9999965
Comparing models with different link functions
The coefficients are greater for logit model
The std. error is minimum for probit model
The slope direction is same for all the three models
probit <- glm(risky.landing ~ aircraft+speed_ground,
family=binomial (link = "probit"),
data=data2)
cloglog <- glm(risky.landing ~ aircraft+speed_ground,
family=binomial (link = "cloglog"),
data=data2)
round(summary(model_4)$coefficients,3)## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -102.077 24.775 -4.120 0.000
## speed_ground 0.926 0.225 4.121 0.000
## aircraftboeing 4.019 1.249 3.217 0.001
round(summary(probit)$coefficients,3)## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -58.693 13.313 -4.409 0.000
## aircraftboeing 2.357 0.702 3.359 0.001
## speed_ground 0.532 0.121 4.411 0.000
round(summary(cloglog)$coefficients,3)## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -69.265 14.740 -4.699 0
## aircraftboeing 2.898 0.800 3.622 0
## speed_ground 0.622 0.133 4.690 0
ROC curves for the three models
par(mfrow=c(1,3))
pred1 <- prediction(predict(model_4), data2$risky.landing)
perf <- performance(pred1, "tpr", "fpr")
plot(perf, main = "Logit Model")
pred2 <- prediction(predict(probit), data2$risky.landing)
perf <- performance(pred2, "tpr", "fpr")
plot(perf, main = "Probit Model")
pred3 <- prediction(predict(cloglog), data2$risky.landing)
perf <- performance(pred3, "tpr", "fpr")
plot(perf, main = "Cloglog Model")AUC for the three are all same and it is 99%
auc_logit <- performance(pred1, measure = "auc")
auc_logit <- auc_logit@y.values[[1]]
auc_probit <- performance(pred2, measure = "auc")
auc_probit <- auc_probit@y.values[[1]]
auc_cloglog <- performance(pred3, measure = "auc")
auc_cloglog <- auc_cloglog@y.values[[1]]
cat("AUC for logit is:",auc_logit," AUC for Probit is: ",
auc_probit, " AUC for Loglog is: ", auc_cloglog)## AUC for logit is: 0.9986161 AUC for Probit is: 0.9986161 AUC for Loglog is: 0.9985736
Flight in observations 64 is appearing in all the three models and is risky.
Flight in observation 307 and 176 is appearing in two of the models and is also having a risky landing
pred_logit <- predict(model_4, type = "response")
pred_probit <- predict(probit, type = "response")
pred_cloglog <- predict(cloglog, type = "response")
head(sort(pred_logit, decreasing = T))## 362 307 64 387 408 176
## 1 1 1 1 1 1
head(sort(pred_probit, decreasing = T))## 56 64 134 176 179 307
## 1 1 1 1 1 1
head(sort(pred_cloglog, decreasing = T))## 19 29 30 56 64 90
## 1 1 1 1 1 1
The probability of risky landing is asymmetric for all the three models.
cat("The confidence interval for logit:",lower,"||", fit, "||", upper)## The confidence interval for logit: 0.9874843 || 0.999789 || 0.9999965
pred2 <- predict(probit, new.ind, type = "link", se.fit = TRUE)
fit <- ilogit(pred2$fit)
upper <- ilogit(pred2$fit + (1.96 * pred2$se.fit))
lower <- ilogit(pred2$fit - (1.96 * pred2$se.fit))
cat("The confidence interval for probit:",lower,"||", fit, "||", upper)## The confidence interval for probit: 0.9347085 || 0.9924005 || 0.9991612
pred3 <- predict(cloglog, new.ind, type = "link", se.fit = TRUE)
fit <- ilogit(pred3$fit)
upper <- ilogit(pred3$fit + (1.96 * pred3$se.fit))
lower <- ilogit(pred3$fit - (1.96 * pred3$se.fit))
cat("The confidence interval for cloglog:",lower,"||", fit, "||", upper)## The confidence interval for cloglog: 0.9463195 || 0.9943444 || 0.99943
We will create a mulitnomial variabe for distance.
faa1 <- faa %>%
mutate(Y = (ifelse(distance < 1000, 1,
ifelse( distance >= 1000 & distance < 2000, 2, 3)) ))
faa1$distance <- NULLNow, we will use multinomial model to fit Y. We treat the new variable Y as categorical under the assumption that the levels of Y have no natural ordering.
faa1$Y <- as.factor(faa1$Y)
faa1 <- select(faa1, -speed_air ) %>%
na.omit()
mmod <- multinom(Y ~ aircraft + duration +
no_pasg + speed_ground + pitch + height , faa1)## # weights: 24 (14 variable)
## initial value 858.016197
## iter 10 value 581.199724
## iter 20 value 236.239538
## iter 30 value 226.272595
## iter 40 value 226.221087
## iter 50 value 226.174784
## final value 226.164124
## converged
Based on AIC, we get the model as -
mmodi <- step(mmod)## Start: AIC=480.33
## Y ~ aircraft + duration + no_pasg + speed_ground + pitch + height
##
## trying - aircraft
## # weights: 21 (12 variable)
## initial value 858.016197
## iter 10 value 599.571323
## iter 20 value 336.226736
## iter 30 value 334.049994
## iter 40 value 334.045290
## final value 334.043139
## converged
## trying - duration
## # weights: 21 (12 variable)
## initial value 858.016197
## iter 10 value 518.225762
## iter 20 value 239.417331
## iter 30 value 228.258109
## iter 40 value 227.608269
## final value 227.113768
## converged
## trying - no_pasg
## # weights: 21 (12 variable)
## initial value 858.016197
## iter 10 value 602.083949
## iter 20 value 240.843149
## iter 30 value 229.631112
## iter 40 value 228.963807
## final value 228.210500
## converged
## trying - speed_ground
## # weights: 21 (12 variable)
## initial value 858.016197
## iter 10 value 804.987828
## final value 794.030499
## converged
## trying - pitch
## # weights: 21 (12 variable)
## initial value 858.016197
## iter 10 value 582.954258
## iter 20 value 237.697711
## iter 30 value 228.361531
## iter 40 value 227.746783
## final value 227.327445
## converged
## trying - height
## # weights: 21 (12 variable)
## initial value 858.016197
## iter 10 value 535.918721
## iter 20 value 302.074403
## iter 30 value 298.921547
## iter 40 value 298.913287
## final value 298.909431
## converged
## Df AIC
## - duration 12 478.2275
## - pitch 12 478.6549
## <none> 14 480.3282
## - no_pasg 12 480.4210
## - height 12 621.8189
## - aircraft 12 692.0863
## - speed_ground 12 1612.0610
## # weights: 21 (12 variable)
## initial value 858.016197
## iter 10 value 518.225762
## iter 20 value 239.417331
## iter 30 value 228.258109
## iter 40 value 227.608269
## final value 227.113768
## converged
##
## Step: AIC=478.23
## Y ~ aircraft + no_pasg + speed_ground + pitch + height
##
## trying - aircraft
## # weights: 18 (10 variable)
## initial value 858.016197
## iter 10 value 465.319936
## iter 20 value 335.496129
## iter 30 value 335.298663
## final value 335.293692
## converged
## trying - no_pasg
## # weights: 18 (10 variable)
## initial value 858.016197
## iter 10 value 444.634863
## iter 20 value 238.604117
## iter 30 value 230.208561
## iter 40 value 229.083783
## iter 40 value 229.083781
## iter 40 value 229.083781
## final value 229.083781
## converged
## trying - speed_ground
## # weights: 18 (10 variable)
## initial value 858.016197
## iter 10 value 799.384355
## final value 796.687299
## converged
## trying - pitch
## # weights: 18 (10 variable)
## initial value 858.016197
## iter 10 value 488.119245
## iter 20 value 236.788749
## iter 30 value 228.715466
## iter 40 value 228.221740
## final value 228.220496
## converged
## trying - height
## # weights: 18 (10 variable)
## initial value 858.016197
## iter 10 value 454.961171
## iter 20 value 300.709209
## iter 30 value 300.281087
## final value 300.277702
## converged
## Df AIC
## - pitch 10 476.4410
## - no_pasg 10 478.1676
## <none> 12 478.2275
## - height 10 620.5554
## - aircraft 10 690.5874
## - speed_ground 10 1613.3746
## # weights: 18 (10 variable)
## initial value 858.016197
## iter 10 value 488.119245
## iter 20 value 236.788749
## iter 30 value 228.715466
## iter 40 value 228.221740
## final value 228.220496
## converged
##
## Step: AIC=476.44
## Y ~ aircraft + no_pasg + speed_ground + height
##
## trying - aircraft
## # weights: 15 (8 variable)
## initial value 858.016197
## iter 10 value 394.729448
## iter 20 value 347.225190
## iter 30 value 347.118971
## final value 347.118955
## converged
## trying - no_pasg
## # weights: 15 (8 variable)
## initial value 858.016197
## iter 10 value 354.469061
## iter 20 value 242.186395
## iter 30 value 231.398332
## final value 230.124753
## converged
## trying - speed_ground
## # weights: 15 (8 variable)
## initial value 858.016197
## iter 10 value 797.491849
## final value 797.489048
## converged
## trying - height
## # weights: 15 (8 variable)
## initial value 858.016197
## iter 10 value 363.948352
## iter 20 value 301.923434
## iter 30 value 301.017662
## final value 301.004927
## converged
## Df AIC
## - no_pasg 8 476.2495
## <none> 10 476.4410
## - height 8 618.0099
## - aircraft 8 710.2379
## - speed_ground 8 1610.9781
## # weights: 15 (8 variable)
## initial value 858.016197
## iter 10 value 354.469061
## iter 20 value 242.186395
## iter 30 value 231.398332
## final value 230.124753
## converged
##
## Step: AIC=476.25
## Y ~ aircraft + speed_ground + height
##
## trying - aircraft
## # weights: 12 (6 variable)
## initial value 858.016197
## iter 10 value 378.591341
## iter 20 value 351.025155
## final value 350.780781
## converged
## trying - speed_ground
## # weights: 12 (6 variable)
## initial value 858.016197
## iter 10 value 797.741960
## final value 797.741923
## converged
## trying - height
## # weights: 12 (6 variable)
## initial value 858.016197
## iter 10 value 345.462031
## iter 20 value 304.390327
## final value 301.955008
## converged
## Df AIC
## <none> 8 476.2495
## - height 6 615.9100
## - aircraft 6 713.5616
## - speed_ground 6 1607.4838
summary(mmodi)## Call:
## multinom(formula = Y ~ aircraft + speed_ground + height, data = faa1)
##
## Coefficients:
## (Intercept) aircraftboeing speed_ground height
## 2 -22.80493 3.863325 0.2393831 0.1516421
## 3 -103.34903 9.209321 1.0112370 0.3441978
##
## Std. Errors:
## (Intercept) aircraftboeing speed_ground height
## 2 1.91262330 0.4056023 0.019916715 0.01813334
## 3 0.08472895 0.5723513 0.009967598 0.02808654
##
## Residual Deviance: 460.2495
## AIC: 476.2495
Now I want to see where the mean values lie for all the variables at different Y.
duration <- tapply(faa1$duration, faa1$Y, mean, na.rm=TRUE)
no_pasg <- tapply(faa1$no_pasg, faa1$Y, mean, na.rm=TRUE)
speed_ground <- tapply(faa1$speed_ground, faa1$Y, mean, na.rm=TRUE)
height <- tapply(faa1$height, faa1$Y, mean, na.rm=TRUE)
pitch <- tapply(faa1$pitch, faa1$Y, mean, na.rm=TRUE)
table <- round(data.frame(duration, no_pasg, speed_ground , height, pitch),3) %>%
t() %>% as.data.frame()
names(table) <- c('Y=1','Y=2','Y=3')
table$variable <- rownames(table)
rownames(table) <- NULL
table <- select(table, variable, everything())
table## variable Y=1 Y=2 Y=3
## 1 duration 160.890 151.935 152.070
## 2 no_pasg 60.233 60.162 59.728
## 3 speed_ground 62.178 79.243 103.649
## 4 height 27.908 31.677 31.509
## 5 pitch 3.980 4.012 4.064
increases the odds of Y=2 by 1.26 relative to Y=1
increases the odds of Y=3 by 2.75 relative to Y=1
increases the odds of Y=2 by 1.16 relative to Y=1
increases the odds of Y=3 by 1.41 relative to Y=1
odds of Y=2 increases by 47 relative to Y=1
odds of Y=3 increases by 9989 relative to Y=1
Aircraft, speed of ground, and height are the final vairables that I obtained.
There is a significant increase in speed of ground with distance.
ggplot(faa1 , aes(x = Y, y = speed_ground)) +
geom_boxplot()ggplot(faa1 , aes(x = Y, y = height)) +
geom_boxplot()