Amazing Report on Airlines Reviews Dataset

2023-08-10

Nazir Ali Khan

Chapter 1: Data Loading

1.1. Loading the Libraries

library(tidyverse) # for data manipulating 
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr) # for brief summary of dataset
library(ggplot2) # for interactive visualizations
library(grid) # for grid
library(gridExtra) # for extra grids
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(lubridate) # for time extraction
library(SnowballC) # for text mining
library(wordcloud) # for word cloud
## Loading required package: RColorBrewer
library(tm) # for text mining
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## 
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(Matrix) # for sparse matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
library(cleanrmd) # for custom css themes

1.2. Loading the Dataset

airlines <- read.csv("~/Workbooks/Airline_Reviews.csv", stringsAsFactors=TRUE)
head(airlines, 10)
##    X  Airline.Name Overall_Rating                          Review_Title
## 1  0   AB Aviation              9               "pretty decent airline"
## 2  1   AB Aviation              1                  "Not a good airline"
## 3  2   AB Aviation              1        "flight was fortunately short"
## 4  3 Adria Airways              1   "I will never fly again with Adria"
## 5  4 Adria Airways              1 "it ruined our last days of holidays"
## 6  5 Adria Airways              1             "Had very bad experience"
## 7  6 Adria Airways              1      "worse than the budget airlines"
## 8  7 Adria Airways              1                "book another company"
## 9  8 Adria Airways              1                "combined two flights"
## 10 9 Adria Airways              8                   "the crew was nice"
##            Review.Date Verified
## 1   11th November 2019     True
## 2       25th June 2019     True
## 3       25th June 2019     True
## 4  28th September 2019    False
## 5  24th September 2019     True
## 6  17th September 2019     True
## 7   6th September 2019    False
## 8     24th August 2019    False
## 9      6th August 2019     True
## 10   12th October 2018     True
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Review
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Moroni to Moheli. Turned out to be a pretty decent airline. Online booking worked well, checkin and boarding was fine and the plane looked well maintained. Its a very short flight - just 20 minutes or so so i didn't expect much but they still managed to hand our a bottle of water and some biscuits which i though was very nice. Both flights on time.
## 2                                                                                                                                                                                                                                                                                                                                                                                  Moroni to Anjouan. It is a very small airline. My ticket advised me to turn up at 0800hrs which I did. There was confusion at this small airport. I was then directed to the office of AB Aviation which was still closed. It opened at 0900hrs and I was told that the flight had been put back to 1300hrs and that they had tried to contact me. This could not be true as they did not have my phone number. I was with a local guide and he had not been informed either. I presume that I was bumped off. The later flight did operate but as usual, there was confusion at check-in. The flight was only 30mins and there were no further problems. Not a good airline but it is the only one for Comoros.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Anjouan to Dzaoudzi. A very small airline and the only airline based in Comoros. Check-in was disorganised because of locals with big packages and disinterested staff. The flight was fortunately short (30 mins). Took off on time and landed on time. With a short flight like there was of course no in-flight entertainment nor cabin service except for biscuits and a bottle of water, which was quite nice!
## 4                                                                                                                                                                                                                                                                                                                                      Please do a favor yourself and do not fly with Adria. On the route from Munich to Pristina in July 2019 they lost my luggage and for 10 days in a row, despite numerous phone calls they were not able to locate it. 11 days later the luggage arrived at the destination completely ruined. Applying for compensation, they ignored my request. Foolishly again, I booked another flight with them (345 euros) Frankfurt - Pristina in September 2019. They cancelled the flight with no reason 24 hours before the departure. Desperate phone calls to customer service to get anything (rerouting, compensation, etc) were not responded. I will never fly again with Adria. What a disgrace! Shame on you Adria for constantly deceiving your customers.
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Do not book a flight with this airline! My friend and I should have returned from Sofia to Amsterdam on September 22 and 3 days before, they sent us an SMS informing the flight was cancelled. For 3 straight days we tried to reach the airline and the web agent (e-dreams) and we did not get a solution. Finally, 18 hours before our cancelled flight time, and after 35 minutes on a call (waiting), the airline was able to get us on a flight with Lufthansa. Do not book Adria Airways, it is unreliable and in our case, it ruined our last days of holidays since we needed to be on the phones all day.
## 6                                                                                                                                                                                                                                                                                                                                          Had very bad experience with rerouted and cancelled flights last weekend with Adria airways. Original Route was Ljubljana to Sarajevo return. Two weeks before i received an email that the flight was cancelled. Offered route change was Ljubljana to Sarajevo via Munich. Flight back changed to Sarajevo-Pristina-Ljubljana. I accepted. The first flight via Munich was ok. Two hours before the return flight I got the email that the flight was cancelled. I had to rebook via hotline and had to accept a flight with Croatian to Zagreb. I reached Ljubljana 4 h later and had to organize Transport from Zagreb to Ljubljana on my own cost. Do not book flights with Adria airways. I heard that their financial situation is very very bad.
## 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Ljubljana to Zürich. Firstly, Ljubljana airport is terrible. Badly trained staff, unfriendly. Toilets are very dirty. Flight 2 hours delayed without any information. There is no Information desk so questions aren‘t possible. Never again will use this airline. Its even worse than the budget airlines and thats difficult. 
## 8    First of all, I am not complaining about a specific flight. I am a Lufthansa frequent Flyer and I normally fly the route Munich - Timisoara. This summer season Lufthansa offered the flights on this route to Adria Airways, as they are a star alliance member. I can only tell that I have the worst experiences with them. In over 90% of the cases they are late, they don't fly on time. Always they offer the same excuse: they have no slot free. This is an unacceptable excuse, as all other companies flying on similar coridors seem not to have this problem. In addition, as LH Cityline was operating these flights I have almost never heard this excuse, or had this problem. I am flying on this route for 6 years already. They started also to cancel some flights. Maybe combine 2 flights into 1 to spare some money, who knows. The cabin crew is decent - not wow, not bad - but there also also situations when the staff are really rude and not customer oriented. I would recommend anyone to fly with this company. If you have the chance, book another company!
## 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Worst Airline ever! They combined two flights to save costs. Instead of flying Pristina - Ljubliana - Zürich we now fly Pristina - Ljubliana - München - Zürich. Now we arrive 2.5h later at our destination.
## 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Ljubljana to Munich. The homebase airport of Adria Airways is Ljubljana and it very small, relaxing and convenient. It is surrounded by the Alps and the departure was fantastic. The airplane was modern and the crew was nice. Ontime departure. Drinks without alcohol were free and I paid 4€ for my white wine. Considering that I had a cheap ticket I cannot complain. Upgrades would have been available for 30€ - which includes the lounge.
##          Aircraft Type.Of.Traveller     Seat.Type
## 1                      Solo Leisure Economy Class
## 2            E120      Solo Leisure Economy Class
## 3   Embraer E120       Solo Leisure Economy Class
## 4                      Solo Leisure Economy Class
## 5                    Couple Leisure Economy Class
## 6          CR 900    Couple Leisure Economy Class
## 7                          Business Economy Class
## 8  Bombardier CRJ      Solo Leisure Economy Class
## 9                      Solo Leisure Economy Class
## 10                   Family Leisure Economy Class
##                               Route     Date.Flown Seat.Comfort
## 1                  Moroni to Moheli  November 2019            4
## 2                 Moroni to Anjouan      June 2019            2
## 3               Anjouan to Dzaoudzi      June 2019            2
## 4             Frankfurt to Pristina September 2019            1
## 5  Sofia to Amsterdam via Ljubljana September 2019            1
## 6             Sarajevo to Ljubljana September 2019            1
## 7               Ljubljana to Zürich September 2019            1
## 8               Timisoara to Munich    August 2019            1
## 9  Pristina to Zürich via Ljubliana    August 2019            1
## 10              Ljubljana to Munich   October 2018            4
##    Cabin.Staff.Service Food...Beverages Ground.Service Inflight.Entertainment
## 1                    5                4              4                     NA
## 2                    2                1              1                     NA
## 3                    1                1              1                     NA
## 4                    1               NA              1                     NA
## 5                    1                1              1                      1
## 6                    1                1              1                      1
## 7                    1                1              1                     NA
## 8                    1                1              1                      1
## 9                    2                1              1                      1
## 10                   4                3              5                     NA
##    Wifi...Connectivity Value.For.Money Recommended
## 1                   NA               3         yes
## 2                   NA               2          no
## 3                   NA               2          no
## 4                   NA               1          no
## 5                    1               1          no
## 6                    1               1          no
## 7                   NA               1          no
## 8                    1               1          no
## 9                    1               1          no
## 10                  NA               5         yes

Chapter 2: Data Cleaning

2.1. Renaming the Variables

glimpse(airlines)
## Rows: 23,171
## Columns: 20
## $ X                      <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ Airline.Name           <fct> AB Aviation, AB Aviation, AB Aviation, Adria Ai…
## $ Overall_Rating         <fct> 9, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1, 1, 2, 2, 3, 1,…
## $ Review_Title           <fct> "\"pretty decent airline\"", "\"Not a good airl…
## $ Review.Date            <fct> 11th November 2019, 25th June 2019, 25th June 2…
## $ Verified               <fct> True, True, True, False, True, True, False, Fal…
## $ Review                 <fct> "  Moroni to Moheli. Turned out to be a pretty …
## $ Aircraft               <fct> "", "E120", "Embraer E120 ", "", "", "CR 900", …
## $ Type.Of.Traveller      <fct> Solo Leisure, Solo Leisure, Solo Leisure, Solo …
## $ Seat.Type              <fct> Economy Class, Economy Class, Economy Class, Ec…
## $ Route                  <fct> "Moroni to Moheli", "Moroni to Anjouan", "Anjou…
## $ Date.Flown             <fct> November 2019, June 2019, June 2019, September …
## $ Seat.Comfort           <dbl> 4, 2, 2, 1, 1, 1, 1, 1, 1, 4, 2, 4, 3, 1, 3, 5,…
## $ Cabin.Staff.Service    <dbl> 5, 2, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1, 3, 2, 3, 5,…
## $ Food...Beverages       <dbl> 4, 1, 1, NA, 1, 1, 1, 1, 1, 3, NA, 1, NA, 2, NA…
## $ Ground.Service         <dbl> 4, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 4, 3, 2, 1, 5,…
## $ Inflight.Entertainment <dbl> NA, NA, NA, NA, 1, 1, NA, 1, 1, NA, 1, NA, NA, …
## $ Wifi...Connectivity    <dbl> NA, NA, NA, NA, 1, 1, NA, 1, 1, NA, 1, NA, NA, …
## $ Value.For.Money        <dbl> 3, 2, 2, 1, 1, 1, 1, 1, 1, 5, 1, 1, 2, 1, 1, 5,…
## $ Recommended            <fct> yes, no, no, no, no, no, no, no, no, yes, no, n…
colnames(airlines) <- c("S_No","Airline_Name","Rating","Review_Title","Review_Date",
                        "Verified","Review","Aircraft", "Traveller_Status","Seat_Type",
                        "Route","Date_Flown","Seat_Comfort","Cabin_Service","Food_Beverages",
                        "Ground_Service","Entertainment","Wifi","Value_for_Money","Recommended")

2.2. Dealing with Missing Data in Categorical Variables

# Converting Variables into Character Strings
airlines$Aircraft <- as.character(airlines$Aircraft)
airlines$Traveller_Status <- as.character(airlines$Traveller_Status)
airlines$Seat_Type <- as.character(airlines$Seat_Type)
airlines$Route <- as.character(airlines$Route)

# Replacing Blank Rows with Unknown
airlines$Aircraft[airlines$Aircraft == ""] <- "Unknown"
airlines$Traveller_Status[airlines$Traveller_Status == ""] <- "Unknown"
airlines$Seat_Type[airlines$Seat_Type == ""] <- "Unknown"
airlines$Route[airlines$Route == ""] <- "Unknown"

# Converting into Factors Again
airlines$Aircraft <- as.factor(airlines$Aircraft)
airlines$Traveller_Status <- as.factor(airlines$Traveller_Status)
airlines$Seat_Type <- as.factor(airlines$Seat_Type)
airlines$Route <- as.factor(airlines$Route)

2.3. Dealing with Data Types

airlines$Rating <- as.integer(airlines$Rating)

2.4. Checking the Summary of the Data

sum(is.na(airlines)) # checking if dataset has null values
## [1] 52540
skim(airlines) # creating the summary of dataset
## Warning: There was 1 warning in `dplyr::summarize()`.
## ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
##   mangled_skimmers$funs)`.
## ℹ In group 0: .
## Caused by warning:
## ! There was 1 warning in `dplyr::summarize()`.
## ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
##   mangled_skimmers$funs)`.
## Caused by warning in `sorted_count()`:
## ! Variable contains value(s) of "" that have been converted to "empty".
Data summary
Name airlines
Number of rows 23171
Number of columns 20
_______________________
Column type frequency:
factor 11
numeric 9
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Airline_Name 0 1 FALSE 497 Aeg: 100, Aer: 100, Aer: 100, Aer: 100
Review_Title 0 1 FALSE 17219 Onu: 84, US : 75, Ger: 74, Mer: 71
Review_Date 0 1 FALSE 4557 16t: 67, 21s: 55, 25t: 55, 26t: 54
Verified 0 1 FALSE 2 Tru: 12322, Fal: 10849
Review 0 1 FALSE 23046 A: 2, A: 2, A: 2, D: 2
Aircraft 2 1 FALSE 1048 Unk: 16042, A32: 1041, Boe: 553, Boe: 404
Traveller_Status 0 1 FALSE 5 Sol: 7120, Cou: 5265, Fam: 4352, Unk: 3738
Seat_Type 0 1 FALSE 5 Eco: 19145, Bus: 2098, Unk: 1096, Pre: 646
Route 0 1 FALSE 13608 Unk: 3828, Mel: 43, Syd: 35, Cap: 34
Date_Flown 0 1 FALSE 110 emp: 3754, Jun: 1057, Jul: 814, May: 788
Recommended 0 1 FALSE 2 no: 15364, yes: 7807

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
S_No 0 1.00 11585.00 6689.04 0 5792.5 11585 17377.5 23170 ▇▇▇▇▇
Rating 0 1.00 3.39 3.10 1 1.0 1 6.0 10 ▇▁▁▂▂
Seat_Comfort 4155 0.82 2.62 1.46 0 1.0 3 4.0 5 ▇▃▅▅▃
Cabin_Service 4260 0.82 2.87 1.60 0 1.0 3 4.0 5 ▇▃▃▃▆
Food_Beverages 8671 0.63 2.55 1.53 0 1.0 2 4.0 5 ▇▃▃▃▃
Ground_Service 4793 0.79 2.35 1.60 1 1.0 1 4.0 5 ▇▂▂▂▃
Entertainment 12342 0.47 2.18 1.49 0 1.0 2 3.0 5 ▇▂▂▂▂
Wifi 17251 0.26 1.78 1.32 0 1.0 1 2.0 5 ▇▁▁▁▁
Value_for_Money 1066 0.95 2.45 1.59 0 1.0 2 4.0 5 ▇▂▂▂▃

Chapter 3: Airlines vs. Ratings

3.1. Top 10 Airlines by Ratings

airlines_rating <- airlines %>% select(Airline_Name,Rating) %>% 
  group_by(Airline_Name) %>% 
  summarise(Total_Rating = n(), Avg_Rating = mean(Rating)) %>% 
  filter(Total_Rating>10) %>% 
  arrange(desc(Avg_Rating)) 
print(airlines_rating)
## # A tibble: 334 × 3
##    Airline_Name     Total_Rating Avg_Rating
##    <fct>                   <int>      <dbl>
##  1 SyrianAir                  23       7.87
##  2 Myanmar Airways            27       7.85
##  3 LAN Peru                   42       7.31
##  4 SkyWest Airlines           38       7.21
##  5 Air Zimbabwe               24       7.17
##  6 Aerosur                    11       7   
##  7 Berjaya Air                12       6.92
##  8 Shaheen Air                12       6.92
##  9 Iran Air                   66       6.86
## 10 Luxair                     86       6.83
## # ℹ 324 more rows

3.1.1 Plot

rating_plot <- airlines_rating %>%
  head(10) %>% 
  mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Rating)) %>% 
  ggplot(aes(Avg_Rating, Airline_Name, fill=Total_Rating)) +
  scale_fill_gradient(low = "#090d5b", high = "#5659a8") +
  geom_bar(stat="identity")
print(rating_plot)

3.2. Top 10 Airlines by Verified Ratings

airlines_rating_verified <- airlines %>% 
  filter(Verified=="True") %>% 
  select(Airline_Name,Rating) %>% 
  group_by(Airline_Name) %>% 
  summarise(Total_Rating = n(), Avg_Rating = mean(Rating)) %>% 
  filter(Total_Rating>10) %>% 
  arrange(desc(Avg_Rating)) 
print(airlines_rating_verified)
## # A tibble: 222 × 3
##    Airline_Name            Total_Rating Avg_Rating
##    <fct>                          <int>      <dbl>
##  1 China Southern Airlines           97       6.31
##  2 Cathay Dragon                     43       6.23
##  3 BA CityFlyer                      13       6.08
##  4 Lao Airlines                      12       5.25
##  5 Belavia                           16       5.19
##  6 QantasLink                        21       5.10
##  7 Nepal Airlines                    15       5.07
##  8 Citilink                          34       5.03
##  9 Rossiya Airlines                  12       5   
## 10 Virgin Australia                  76       4.93
## # ℹ 212 more rows

3.2.1 Plot

verified_rating_plot <- airlines_rating_verified %>%
  head(10) %>% 
  mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Rating)) %>% 
  ggplot(aes(Avg_Rating, Airline_Name, fill=Total_Rating)) +
  scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
  geom_bar(stat="identity")
print(verified_rating_plot)

3.3. Top 10 Airlines: Verified Ratings vs. Mixed Ratings

rating_plot_no_leg <- rating_plot + theme(legend.position = "none") # removing the legend of 1st plot
vs_text <- textGrob("Vs.", gp = gpar(fontsize = 16, fontface = "bold")) # creating a block of vs. text
grid.arrange(rating_plot_no_leg, vs_text, verified_rating_plot, ncol = 3, widths = c(4, 1, 4)) # arranging both plots in one frame

Chapter 4: Aircraft vs Seat Type

4.1. Top 10 Airlines by Ratings

seat_type_comfort <- airlines %>% 
  select(Seat_Type, Seat_Comfort) %>% 
  filter(Seat_Type != "Unknown") %>% 
  group_by(Seat_Type) %>% 
  summarize(Avg_Comfort = mean(Seat_Comfort, na.rm = T)) %>% 
  arrange(desc(Avg_Comfort))
print(seat_type_comfort)
## # A tibble: 4 × 2
##   Seat_Type       Avg_Comfort
##   <fct>                 <dbl>
## 1 First Class            3.49
## 2 Business Class         3.36
## 3 Premium Economy        2.75
## 4 Economy Class          2.52

4.1.1. Plot

seat_type_comfort_plot <- seat_type_comfort %>%
  head(10) %>% 
  mutate(Seat_Type = fct_reorder(Seat_Type, Avg_Comfort)) %>% 
  ggplot(aes(Avg_Comfort, Seat_Type, fill=Avg_Comfort)) +
  scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
  geom_bar(stat="identity")
print(seat_type_comfort_plot)

Chapter 5: Airlines by Service

5.1. Airlines by Ground Service

ground <- airlines %>% 
  select(Airline_Name, Ground_Service) %>% 
  group_by(Airline_Name) %>% 
  summarise(Avg_Ground_Service = mean(Ground_Service, na.rm = T), Total_Customers = n()) %>%
  filter(Total_Customers >= 50) %>% 
  arrange(desc(Avg_Ground_Service)) %>% 
  head(10) 
print(ground)
## # A tibble: 10 × 3
##    Airline_Name            Avg_Ground_Service Total_Customers
##    <fct>                                <dbl>           <int>
##  1 China Southern Airlines               4.87             100
##  2 Hainan Airlines                       4.31             100
##  3 Rex Airlines                          4.21              51
##  4 ANA All Nippon Airways                4.19             100
##  5 Garuda Indonesia                      4.19             100
##  6 Regional Express                      3.98              87
##  7 BA CityFlyer                          3.97              72
##  8 Japan Airlines                        3.90             100
##  9 Cathay Dragon                         3.85              62
## 10 Azerbaijan Airlines                   3.84              68

5.1.1. Plot

ground_plot <- ground %>%
  mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Ground_Service)) %>% 
  ggplot(aes(Avg_Ground_Service, Airline_Name, fill=Total_Customers)) +
  scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
  geom_bar(stat="identity") 
print(ground_plot)

5.2. Airlines by Cabin Service

cabin <- airlines %>% 
  select(Airline_Name, Cabin_Service) %>% 
  group_by(Airline_Name) %>% 
  summarise(Avg_Cabin_Service = mean(Cabin_Service, na.rm = T), Total_Customers = n()) %>%
  filter(Total_Customers >= 50) %>% 
  arrange(desc(Avg_Cabin_Service)) %>% 
  head(10) 
print(cabin)
## # A tibble: 10 × 3
##    Airline_Name            Avg_Cabin_Service Total_Customers
##    <fct>                               <dbl>           <int>
##  1 Hainan Airlines                      4.79             100
##  2 China Southern Airlines              4.74             100
##  3 Rex Airlines                         4.65              51
##  4 ANA All Nippon Airways               4.60             100
##  5 BA CityFlyer                         4.37              72
##  6 Garuda Indonesia                     4.26             100
##  7 Air Astana                           4.25             100
##  8 Japan Airlines                       4.25             100
##  9 Thai Smile Airways                   4.22             100
## 10 Regional Express                     4.14              87

5.2.1. Plot

cabin_plot <- cabin %>%
  mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Cabin_Service)) %>% 
  ggplot(aes(Avg_Cabin_Service, Airline_Name, fill=Total_Customers)) +
  scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
  geom_bar(stat="identity") 
print(cabin_plot)

5.3. Ground vs. Cabin Service Comparison of Top 10 Airlines

service_plot_no_legend <- ground_plot + theme(legend.position = "none")
vs_text <- textGrob("Vs.", gp = gpar(fontsize = 16, fontface = "bold"))
grid.arrange(service_plot_no_legend, vs_text, cabin_plot, ncol = 3, widths = c(4, 1, 4))

Chapter 6: Additional Features of Airlines

6.1. Top Airlines by Food Quality, Wifi & Entertainment

add_ons <- airlines %>% select(Airline_Name, Wifi, Food_Beverages, Entertainment) %>% 
  group_by(Airline_Name) %>% 
  summarise(Avg_Food_Rating = mean(Food_Beverages, na.rm = T), Avg_Wifi = mean(Wifi, na.rm = T), Avg_Entertainment = mean(Entertainment, na.rm = T), Total_Customers = n()) %>% 
  filter(Total_Customers >=50)
print(add_ons)
## # A tibble: 211 × 5
##    Airline_Name       Avg_Food_Rating Avg_Wifi Avg_Entertainment Total_Customers
##    <fct>                        <dbl>    <dbl>             <dbl>           <int>
##  1 Adria Airways                 2.63     1.9               1.72              91
##  2 Aegean Airlines               2.82     2.67              2.52             100
##  3 Aer Lingus                    1.91     1.62              2.13             100
##  4 Aeroflot Russian …            3.03     2.44              3.02             100
##  5 Aerolineas Argent…            2.49     1.79              2.24             100
##  6 Aeromexico                    1.64     1.34              1.83             100
##  7 Air Arabia                    1.96     1.68              1.49             100
##  8 Air Astana                    4.09     3.38              3.86             100
##  9 Air Berlin                    2.65     2.31              2.69             100
## 10 Air Canada                    1.94     1.52              2.32             100
## # ℹ 201 more rows

6.1.1. Plot

add_ons_plot <- add_ons %>%
  ggplot(aes(Avg_Wifi, Avg_Food_Rating, fill=Avg_Entertainment)) +
  scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
  geom_bar(stat="identity") 
print(add_ons_plot)
## Warning: Removed 3 rows containing missing values (`position_stack()`).

Chapter 7: Mostly Used Aircraft

7.1. Top 20 Aircraft by Number of Responses

count_aircraft <- airlines %>%  
  filter(Aircraft != "Unknown") %>% 
  group_by(Aircraft) %>%
  summarise(Total_Aircraft = n()) %>% 
  arrange(desc(Total_Aircraft)) %>% 
  head(20)
print(count_aircraft)
## # A tibble: 20 × 2
##    Aircraft         Total_Aircraft
##    <fct>                     <int>
##  1 A320                       1041
##  2 Boeing 737-800              553
##  3 Boeing 737                  404
##  4 A330                        349
##  5 Boeing 787                  349
##  6 A321                        271
##  7 A319                        233
##  8 Boeing 787-9                174
##  9 Boeing 777                  160
## 10 A330-300                    142
## 11 A320-200                    123
## 12 A350                        122
## 13 A330-200                    102
## 14 Boeing 777-300ER            101
## 15 Boeing 777-300               92
## 16 A350-900                     84
## 17 A380                         81
## 18 Boeing 787-8                 77
## 19 A340                         72
## 20 Boeing 767                   60

7.1.1. Plot

count_aircraft_plot <- count_aircraft %>% mutate(Aircraft = fct_reorder(Aircraft, Total_Aircraft)) %>% 
  ggplot(aes(x=Total_Aircraft, y=Aircraft, fill=Total_Aircraft)) +
  scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
  geom_bar(stat="identity")
print(count_aircraft_plot)

Chapter 8: Top Routes

8.1. Top 10 Routes

route_most_visit <- airlines %>% select(Route) %>%
  filter(Route != "Unknown") %>% 
  group_by(Route) %>% 
  summarise(Trip_Count = n()) %>% 
  arrange(desc(Trip_Count)) %>% 
  head(10)
print(route_most_visit)
## # A tibble: 10 × 2
##    Route                     Trip_Count
##    <fct>                          <int>
##  1 Melbourne to Sydney               43
##  2 Sydney to Melbourne               35
##  3 Cape Town to Johannesburg         34
##  4 Cusco to Lima                     30
##  5 Bangkok to Phuket                 28
##  6 Johannesburg to Cape Town         27
##  7 Kuala Lumpur to Singapore         27
##  8 Bangkok to Chiang Mai             26
##  9 Johannesburg to Durban            22
## 10 Toronto to Calgary                21

8.1.1. Plot

route_most_visit %>% mutate(Route = fct_reorder(Route, Trip_Count)) %>% 
  ggplot(aes(x=Trip_Count, y=Route, fill=Trip_Count)) +
  scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
  geom_bar(stat="identity")

print(route_most_visit)
## # A tibble: 10 × 2
##    Route                     Trip_Count
##    <fct>                          <int>
##  1 Melbourne to Sydney               43
##  2 Sydney to Melbourne               35
##  3 Cape Town to Johannesburg         34
##  4 Cusco to Lima                     30
##  5 Bangkok to Phuket                 28
##  6 Johannesburg to Cape Town         27
##  7 Kuala Lumpur to Singapore         27
##  8 Bangkok to Chiang Mai             26
##  9 Johannesburg to Durban            22
## 10 Toronto to Calgary                21

Chapter 9: Trips over Years

9.1. Trips over the Years

busy_dates <- airlines %>% select(Date_Flown) %>% 
  filter(Date_Flown != "Not Available") %>% 
  group_by(Date_Flown) %>% 
  summarise(Total_Flights=n())
print(busy_dates)
## # A tibble: 110 × 2
##    Date_Flown   Total_Flights
##    <fct>                <int>
##  1 ""                    3754
##  2 "April 2012"             1
##  3 "April 2015"             9
##  4 "April 2016"            82
##  5 "April 2017"           102
##  6 "April 2018"           103
##  7 "April 2019"           251
##  8 "April 2020"            92
##  9 "April 2021"            57
## 10 "April 2022"           205
## # ℹ 100 more rows

9.2. Creating the Time Series

busy_dates$Date_Flown <- my(busy_dates$Date_Flown)

9.3. Time Series Visualization

busy_dates_plot <- busy_dates %>% ggplot(aes(x=Date_Flown, y=Total_Flights)) +
  geom_point() +
  geom_line()
print(busy_dates_plot)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).

9.4. Prophet Forecasting for the Next 3 Years

Chapter 10: Which Airlines are Value-for-Money & Recommended?

10.1. Airlines having 50+ Flights, Atleast 3 Value-for-Money Rating & Over 70% Recommendations

worth_it <- airlines %>% select(Airline_Name, Value_for_Money, Recommended) %>% 
  group_by(Airline_Name) %>% 
  summarise(Value = mean(Value_for_Money, na.rm = T), Recommendation = sum(Recommended=="yes")/n()*100, Total_Flights=n()) %>% 
  filter(Total_Flights>=50 & Value>=3 & Recommendation >=70) %>% 
  arrange(desc(Value), desc(Recommendation))
print(worth_it)
## # A tibble: 15 × 4
##    Airline_Name            Value Recommendation Total_Flights
##    <fct>                   <dbl>          <dbl>         <int>
##  1 China Southern Airlines  4.54           98             100
##  2 Hainan Airlines          4.54           87             100
##  3 ANA All Nippon Airways   4.11           78             100
##  4 Rex Airlines             3.98           76.5            51
##  5 BA CityFlyer             3.97           86.1            72
##  6 Garuda Indonesia         3.88           73             100
##  7 Thai Smile Airways       3.82           70             100
##  8 Cathay Dragon            3.77           82.3            62
##  9 Citilink                 3.68           75              60
## 10 Dragonair                3.66           72             100
## 11 Bangkok Airways          3.64           72             100
## 12 Air Astana               3.61           72             100
## 13 S7 Siberia Airlines      3.59           71.2            66
## 14 Lao Airlines             3.55           90.3            72
## 15 Olympic Air              3.37           71.0            93
worth_it_plot <- worth_it %>% ggplot(aes(x = Airline_Name)) +
  geom_bar(aes(y = Value, fill = "Value"), stat = "identity", position = "dodge", size=0.2) +
  geom_bar(aes(y = Recommendation, fill = "Recommendation"), stat = "identity", position = "dodge") 
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
print(worth_it_plot)

Chapter 11: Text Analysis on Reviews

11.1. Subsetting Data for Text Mining

reviews <- airlines %>% select(Airline_Name, Review_Title, Review, Review_Date) 

11.2. Word Cloud

corpus1 <- iconv(reviews$Review_Title, to="utf-8") 
titles <- Corpus(VectorSource(corpus1)) # creating corpus of review titles

corpus2 <- iconv(reviews$Review, to="utf-8") 
corpus2 <- iconv(reviews$Review, to = "UTF-8", sub = "byte")
comment <- Corpus(VectorSource(corpus2)) # creating corpus of reviews

11.2.1 Cleaning the Comments

comment <- tm_map(comment, tolower)
## Warning in tm_map.SimpleCorpus(comment, tolower): transformation drops
## documents
comment <- tm_map(comment, removePunctuation)
## Warning in tm_map.SimpleCorpus(comment, removePunctuation): transformation
## drops documents
comment <- tm_map(comment, removeNumbers)
## Warning in tm_map.SimpleCorpus(comment, removeNumbers): transformation drops
## documents
comment <- tm_map(comment, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(comment, removeWords, stopwords("english")):
## transformation drops documents
comment <- tm_map(comment, stripWhitespace)
## Warning in tm_map.SimpleCorpus(comment, stripWhitespace): transformation drops
## documents

11.2.2. Creating the Term Document Matrix

#dtm <- TermDocumentMatrix(comment)
#dtm <- as.matrix(dtm)

11.2.3. Creating the Word Frequency Table

#word_freq <- rowSums(dtm)
#word_freq <- subset(word_freq, word_freq>1000) # subsetting words having frequency more than 50
#length(word_freq)

11.2.4. Word Frequency Bar Plot

#barplot(word_freq, las = 0, col = "blue", main = "Word Frequency Barplot",
#        xlab = "Words", ylab = "Frequency", cex.names = 0.7)

11.2.5. Word Cloud

#set.seed(1234)
# wordcloud(words = names(word_freq), freq = word_freq, scale = c(3, 1.5), 
#          random.order = FALSE, 
#          colors = brewer.pal(8, "Dark2"),
#          rot.per = 0.3,
#          max.words = 50)
Thank You…