Amazing Report on Airlines Reviews Dataset
2023-08-10
- Chapter 1: Data Loading
- Chapter 2: Data Cleaning
- Chapter 3: Airlines vs. Ratings
- Chapter 4: Aircraft vs Seat Type
- Chapter 5: Airlines by Service
- Chapter 6: Additional Features of Airlines
- Chapter 7: Mostly Used Aircraft
- Chapter 8: Top Routes
- Chapter 9: Trips over Years
- Chapter 10: Which Airlines are Value-for-Money & Recommended?
- Chapter 11: Text Analysis on Reviews
Chapter 1: Data Loading
1.1. Loading the Libraries
library(tidyverse) # for data manipulating ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr) # for brief summary of dataset
library(ggplot2) # for interactive visualizations
library(grid) # for grid
library(gridExtra) # for extra grids##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(lubridate) # for time extraction
library(SnowballC) # for text mining
library(wordcloud) # for word cloud## Loading required package: RColorBrewer
library(tm) # for text mining## Loading required package: NLP
##
## Attaching package: 'NLP'
##
## The following object is masked from 'package:ggplot2':
##
## annotate
library(Matrix) # for sparse matrix##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
library(cleanrmd) # for custom css themes1.2. Loading the Dataset
airlines <- read.csv("~/Workbooks/Airline_Reviews.csv", stringsAsFactors=TRUE)
head(airlines, 10)## X Airline.Name Overall_Rating Review_Title
## 1 0 AB Aviation 9 "pretty decent airline"
## 2 1 AB Aviation 1 "Not a good airline"
## 3 2 AB Aviation 1 "flight was fortunately short"
## 4 3 Adria Airways 1 "I will never fly again with Adria"
## 5 4 Adria Airways 1 "it ruined our last days of holidays"
## 6 5 Adria Airways 1 "Had very bad experience"
## 7 6 Adria Airways 1 "worse than the budget airlines"
## 8 7 Adria Airways 1 "book another company"
## 9 8 Adria Airways 1 "combined two flights"
## 10 9 Adria Airways 8 "the crew was nice"
## Review.Date Verified
## 1 11th November 2019 True
## 2 25th June 2019 True
## 3 25th June 2019 True
## 4 28th September 2019 False
## 5 24th September 2019 True
## 6 17th September 2019 True
## 7 6th September 2019 False
## 8 24th August 2019 False
## 9 6th August 2019 True
## 10 12th October 2018 True
## Review
## 1 Moroni to Moheli. Turned out to be a pretty decent airline. Online booking worked well, checkin and boarding was fine and the plane looked well maintained. Its a very short flight - just 20 minutes or so so i didn't expect much but they still managed to hand our a bottle of water and some biscuits which i though was very nice. Both flights on time.
## 2 Moroni to Anjouan. It is a very small airline. My ticket advised me to turn up at 0800hrs which I did. There was confusion at this small airport. I was then directed to the office of AB Aviation which was still closed. It opened at 0900hrs and I was told that the flight had been put back to 1300hrs and that they had tried to contact me. This could not be true as they did not have my phone number. I was with a local guide and he had not been informed either. I presume that I was bumped off. The later flight did operate but as usual, there was confusion at check-in. The flight was only 30mins and there were no further problems. Not a good airline but it is the only one for Comoros.
## 3 Anjouan to Dzaoudzi. A very small airline and the only airline based in Comoros. Check-in was disorganised because of locals with big packages and disinterested staff. The flight was fortunately short (30 mins). Took off on time and landed on time. With a short flight like there was of course no in-flight entertainment nor cabin service except for biscuits and a bottle of water, which was quite nice!
## 4 Please do a favor yourself and do not fly with Adria. On the route from Munich to Pristina in July 2019 they lost my luggage and for 10 days in a row, despite numerous phone calls they were not able to locate it. 11 days later the luggage arrived at the destination completely ruined. Applying for compensation, they ignored my request. Foolishly again, I booked another flight with them (345 euros) Frankfurt - Pristina in September 2019. They cancelled the flight with no reason 24 hours before the departure. Desperate phone calls to customer service to get anything (rerouting, compensation, etc) were not responded. I will never fly again with Adria. What a disgrace! Shame on you Adria for constantly deceiving your customers.
## 5 Do not book a flight with this airline! My friend and I should have returned from Sofia to Amsterdam on September 22 and 3 days before, they sent us an SMS informing the flight was cancelled. For 3 straight days we tried to reach the airline and the web agent (e-dreams) and we did not get a solution. Finally, 18 hours before our cancelled flight time, and after 35 minutes on a call (waiting), the airline was able to get us on a flight with Lufthansa. Do not book Adria Airways, it is unreliable and in our case, it ruined our last days of holidays since we needed to be on the phones all day.
## 6 Had very bad experience with rerouted and cancelled flights last weekend with Adria airways. Original Route was Ljubljana to Sarajevo return. Two weeks before i received an email that the flight was cancelled. Offered route change was Ljubljana to Sarajevo via Munich. Flight back changed to Sarajevo-Pristina-Ljubljana. I accepted. The first flight via Munich was ok. Two hours before the return flight I got the email that the flight was cancelled. I had to rebook via hotline and had to accept a flight with Croatian to Zagreb. I reached Ljubljana 4 h later and had to organize Transport from Zagreb to Ljubljana on my own cost. Do not book flights with Adria airways. I heard that their financial situation is very very bad.
## 7 Ljubljana to Zürich. Firstly, Ljubljana airport is terrible. Badly trained staff, unfriendly. Toilets are very dirty. Flight 2 hours delayed without any information. There is no Information desk so questions aren‘t possible. Never again will use this airline. Its even worse than the budget airlines and thats difficult.
## 8 First of all, I am not complaining about a specific flight. I am a Lufthansa frequent Flyer and I normally fly the route Munich - Timisoara. This summer season Lufthansa offered the flights on this route to Adria Airways, as they are a star alliance member. I can only tell that I have the worst experiences with them. In over 90% of the cases they are late, they don't fly on time. Always they offer the same excuse: they have no slot free. This is an unacceptable excuse, as all other companies flying on similar coridors seem not to have this problem. In addition, as LH Cityline was operating these flights I have almost never heard this excuse, or had this problem. I am flying on this route for 6 years already. They started also to cancel some flights. Maybe combine 2 flights into 1 to spare some money, who knows. The cabin crew is decent - not wow, not bad - but there also also situations when the staff are really rude and not customer oriented. I would recommend anyone to fly with this company. If you have the chance, book another company!
## 9 Worst Airline ever! They combined two flights to save costs. Instead of flying Pristina - Ljubliana - Zürich we now fly Pristina - Ljubliana - München - Zürich. Now we arrive 2.5h later at our destination.
## 10 Ljubljana to Munich. The homebase airport of Adria Airways is Ljubljana and it very small, relaxing and convenient. It is surrounded by the Alps and the departure was fantastic. The airplane was modern and the crew was nice. Ontime departure. Drinks without alcohol were free and I paid 4€ for my white wine. Considering that I had a cheap ticket I cannot complain. Upgrades would have been available for 30€ - which includes the lounge.
## Aircraft Type.Of.Traveller Seat.Type
## 1 Solo Leisure Economy Class
## 2 E120 Solo Leisure Economy Class
## 3 Embraer E120 Solo Leisure Economy Class
## 4 Solo Leisure Economy Class
## 5 Couple Leisure Economy Class
## 6 CR 900 Couple Leisure Economy Class
## 7 Business Economy Class
## 8 Bombardier CRJ Solo Leisure Economy Class
## 9 Solo Leisure Economy Class
## 10 Family Leisure Economy Class
## Route Date.Flown Seat.Comfort
## 1 Moroni to Moheli November 2019 4
## 2 Moroni to Anjouan June 2019 2
## 3 Anjouan to Dzaoudzi June 2019 2
## 4 Frankfurt to Pristina September 2019 1
## 5 Sofia to Amsterdam via Ljubljana September 2019 1
## 6 Sarajevo to Ljubljana September 2019 1
## 7 Ljubljana to Zürich September 2019 1
## 8 Timisoara to Munich August 2019 1
## 9 Pristina to Zürich via Ljubliana August 2019 1
## 10 Ljubljana to Munich October 2018 4
## Cabin.Staff.Service Food...Beverages Ground.Service Inflight.Entertainment
## 1 5 4 4 NA
## 2 2 1 1 NA
## 3 1 1 1 NA
## 4 1 NA 1 NA
## 5 1 1 1 1
## 6 1 1 1 1
## 7 1 1 1 NA
## 8 1 1 1 1
## 9 2 1 1 1
## 10 4 3 5 NA
## Wifi...Connectivity Value.For.Money Recommended
## 1 NA 3 yes
## 2 NA 2 no
## 3 NA 2 no
## 4 NA 1 no
## 5 1 1 no
## 6 1 1 no
## 7 NA 1 no
## 8 1 1 no
## 9 1 1 no
## 10 NA 5 yes
Chapter 2: Data Cleaning
2.1. Renaming the Variables
glimpse(airlines)## Rows: 23,171
## Columns: 20
## $ X <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ Airline.Name <fct> AB Aviation, AB Aviation, AB Aviation, Adria Ai…
## $ Overall_Rating <fct> 9, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1, 1, 2, 2, 3, 1,…
## $ Review_Title <fct> "\"pretty decent airline\"", "\"Not a good airl…
## $ Review.Date <fct> 11th November 2019, 25th June 2019, 25th June 2…
## $ Verified <fct> True, True, True, False, True, True, False, Fal…
## $ Review <fct> " Moroni to Moheli. Turned out to be a pretty …
## $ Aircraft <fct> "", "E120", "Embraer E120 ", "", "", "CR 900", …
## $ Type.Of.Traveller <fct> Solo Leisure, Solo Leisure, Solo Leisure, Solo …
## $ Seat.Type <fct> Economy Class, Economy Class, Economy Class, Ec…
## $ Route <fct> "Moroni to Moheli", "Moroni to Anjouan", "Anjou…
## $ Date.Flown <fct> November 2019, June 2019, June 2019, September …
## $ Seat.Comfort <dbl> 4, 2, 2, 1, 1, 1, 1, 1, 1, 4, 2, 4, 3, 1, 3, 5,…
## $ Cabin.Staff.Service <dbl> 5, 2, 1, 1, 1, 1, 1, 1, 2, 4, 1, 1, 3, 2, 3, 5,…
## $ Food...Beverages <dbl> 4, 1, 1, NA, 1, 1, 1, 1, 1, 3, NA, 1, NA, 2, NA…
## $ Ground.Service <dbl> 4, 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 4, 3, 2, 1, 5,…
## $ Inflight.Entertainment <dbl> NA, NA, NA, NA, 1, 1, NA, 1, 1, NA, 1, NA, NA, …
## $ Wifi...Connectivity <dbl> NA, NA, NA, NA, 1, 1, NA, 1, 1, NA, 1, NA, NA, …
## $ Value.For.Money <dbl> 3, 2, 2, 1, 1, 1, 1, 1, 1, 5, 1, 1, 2, 1, 1, 5,…
## $ Recommended <fct> yes, no, no, no, no, no, no, no, no, yes, no, n…
colnames(airlines) <- c("S_No","Airline_Name","Rating","Review_Title","Review_Date",
"Verified","Review","Aircraft", "Traveller_Status","Seat_Type",
"Route","Date_Flown","Seat_Comfort","Cabin_Service","Food_Beverages",
"Ground_Service","Entertainment","Wifi","Value_for_Money","Recommended")2.2. Dealing with Missing Data in Categorical Variables
# Converting Variables into Character Strings
airlines$Aircraft <- as.character(airlines$Aircraft)
airlines$Traveller_Status <- as.character(airlines$Traveller_Status)
airlines$Seat_Type <- as.character(airlines$Seat_Type)
airlines$Route <- as.character(airlines$Route)
# Replacing Blank Rows with Unknown
airlines$Aircraft[airlines$Aircraft == ""] <- "Unknown"
airlines$Traveller_Status[airlines$Traveller_Status == ""] <- "Unknown"
airlines$Seat_Type[airlines$Seat_Type == ""] <- "Unknown"
airlines$Route[airlines$Route == ""] <- "Unknown"
# Converting into Factors Again
airlines$Aircraft <- as.factor(airlines$Aircraft)
airlines$Traveller_Status <- as.factor(airlines$Traveller_Status)
airlines$Seat_Type <- as.factor(airlines$Seat_Type)
airlines$Route <- as.factor(airlines$Route)2.3. Dealing with Data Types
airlines$Rating <- as.integer(airlines$Rating)2.4. Checking the Summary of the Data
sum(is.na(airlines)) # checking if dataset has null values## [1] 52540
skim(airlines) # creating the summary of dataset## Warning: There was 1 warning in `dplyr::summarize()`.
## ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
## mangled_skimmers$funs)`.
## ℹ In group 0: .
## Caused by warning:
## ! There was 1 warning in `dplyr::summarize()`.
## ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names),
## mangled_skimmers$funs)`.
## Caused by warning in `sorted_count()`:
## ! Variable contains value(s) of "" that have been converted to "empty".
| Name | airlines |
| Number of rows | 23171 |
| Number of columns | 20 |
| _______________________ | |
| Column type frequency: | |
| factor | 11 |
| numeric | 9 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| Airline_Name | 0 | 1 | FALSE | 497 | Aeg: 100, Aer: 100, Aer: 100, Aer: 100 |
| Review_Title | 0 | 1 | FALSE | 17219 | Onu: 84, US : 75, Ger: 74, Mer: 71 |
| Review_Date | 0 | 1 | FALSE | 4557 | 16t: 67, 21s: 55, 25t: 55, 26t: 54 |
| Verified | 0 | 1 | FALSE | 2 | Tru: 12322, Fal: 10849 |
| Review | 0 | 1 | FALSE | 23046 | A: 2, A: 2, A: 2, D: 2 |
| Aircraft | 2 | 1 | FALSE | 1048 | Unk: 16042, A32: 1041, Boe: 553, Boe: 404 |
| Traveller_Status | 0 | 1 | FALSE | 5 | Sol: 7120, Cou: 5265, Fam: 4352, Unk: 3738 |
| Seat_Type | 0 | 1 | FALSE | 5 | Eco: 19145, Bus: 2098, Unk: 1096, Pre: 646 |
| Route | 0 | 1 | FALSE | 13608 | Unk: 3828, Mel: 43, Syd: 35, Cap: 34 |
| Date_Flown | 0 | 1 | FALSE | 110 | emp: 3754, Jun: 1057, Jul: 814, May: 788 |
| Recommended | 0 | 1 | FALSE | 2 | no: 15364, yes: 7807 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| S_No | 0 | 1.00 | 11585.00 | 6689.04 | 0 | 5792.5 | 11585 | 17377.5 | 23170 | ▇▇▇▇▇ |
| Rating | 0 | 1.00 | 3.39 | 3.10 | 1 | 1.0 | 1 | 6.0 | 10 | ▇▁▁▂▂ |
| Seat_Comfort | 4155 | 0.82 | 2.62 | 1.46 | 0 | 1.0 | 3 | 4.0 | 5 | ▇▃▅▅▃ |
| Cabin_Service | 4260 | 0.82 | 2.87 | 1.60 | 0 | 1.0 | 3 | 4.0 | 5 | ▇▃▃▃▆ |
| Food_Beverages | 8671 | 0.63 | 2.55 | 1.53 | 0 | 1.0 | 2 | 4.0 | 5 | ▇▃▃▃▃ |
| Ground_Service | 4793 | 0.79 | 2.35 | 1.60 | 1 | 1.0 | 1 | 4.0 | 5 | ▇▂▂▂▃ |
| Entertainment | 12342 | 0.47 | 2.18 | 1.49 | 0 | 1.0 | 2 | 3.0 | 5 | ▇▂▂▂▂ |
| Wifi | 17251 | 0.26 | 1.78 | 1.32 | 0 | 1.0 | 1 | 2.0 | 5 | ▇▁▁▁▁ |
| Value_for_Money | 1066 | 0.95 | 2.45 | 1.59 | 0 | 1.0 | 2 | 4.0 | 5 | ▇▂▂▂▃ |
Chapter 3: Airlines vs. Ratings
3.1. Top 10 Airlines by Ratings
airlines_rating <- airlines %>% select(Airline_Name,Rating) %>%
group_by(Airline_Name) %>%
summarise(Total_Rating = n(), Avg_Rating = mean(Rating)) %>%
filter(Total_Rating>10) %>%
arrange(desc(Avg_Rating))
print(airlines_rating)## # A tibble: 334 × 3
## Airline_Name Total_Rating Avg_Rating
## <fct> <int> <dbl>
## 1 SyrianAir 23 7.87
## 2 Myanmar Airways 27 7.85
## 3 LAN Peru 42 7.31
## 4 SkyWest Airlines 38 7.21
## 5 Air Zimbabwe 24 7.17
## 6 Aerosur 11 7
## 7 Berjaya Air 12 6.92
## 8 Shaheen Air 12 6.92
## 9 Iran Air 66 6.86
## 10 Luxair 86 6.83
## # ℹ 324 more rows
3.1.1 Plot
rating_plot <- airlines_rating %>%
head(10) %>%
mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Rating)) %>%
ggplot(aes(Avg_Rating, Airline_Name, fill=Total_Rating)) +
scale_fill_gradient(low = "#090d5b", high = "#5659a8") +
geom_bar(stat="identity")
print(rating_plot)3.2. Top 10 Airlines by Verified Ratings
airlines_rating_verified <- airlines %>%
filter(Verified=="True") %>%
select(Airline_Name,Rating) %>%
group_by(Airline_Name) %>%
summarise(Total_Rating = n(), Avg_Rating = mean(Rating)) %>%
filter(Total_Rating>10) %>%
arrange(desc(Avg_Rating))
print(airlines_rating_verified)## # A tibble: 222 × 3
## Airline_Name Total_Rating Avg_Rating
## <fct> <int> <dbl>
## 1 China Southern Airlines 97 6.31
## 2 Cathay Dragon 43 6.23
## 3 BA CityFlyer 13 6.08
## 4 Lao Airlines 12 5.25
## 5 Belavia 16 5.19
## 6 QantasLink 21 5.10
## 7 Nepal Airlines 15 5.07
## 8 Citilink 34 5.03
## 9 Rossiya Airlines 12 5
## 10 Virgin Australia 76 4.93
## # ℹ 212 more rows
3.2.1 Plot
verified_rating_plot <- airlines_rating_verified %>%
head(10) %>%
mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Rating)) %>%
ggplot(aes(Avg_Rating, Airline_Name, fill=Total_Rating)) +
scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
geom_bar(stat="identity")
print(verified_rating_plot)3.3. Top 10 Airlines: Verified Ratings vs. Mixed Ratings
rating_plot_no_leg <- rating_plot + theme(legend.position = "none") # removing the legend of 1st plot
vs_text <- textGrob("Vs.", gp = gpar(fontsize = 16, fontface = "bold")) # creating a block of vs. text
grid.arrange(rating_plot_no_leg, vs_text, verified_rating_plot, ncol = 3, widths = c(4, 1, 4)) # arranging both plots in one frameChapter 4: Aircraft vs Seat Type
4.1. Top 10 Airlines by Ratings
seat_type_comfort <- airlines %>%
select(Seat_Type, Seat_Comfort) %>%
filter(Seat_Type != "Unknown") %>%
group_by(Seat_Type) %>%
summarize(Avg_Comfort = mean(Seat_Comfort, na.rm = T)) %>%
arrange(desc(Avg_Comfort))
print(seat_type_comfort)## # A tibble: 4 × 2
## Seat_Type Avg_Comfort
## <fct> <dbl>
## 1 First Class 3.49
## 2 Business Class 3.36
## 3 Premium Economy 2.75
## 4 Economy Class 2.52
4.1.1. Plot
seat_type_comfort_plot <- seat_type_comfort %>%
head(10) %>%
mutate(Seat_Type = fct_reorder(Seat_Type, Avg_Comfort)) %>%
ggplot(aes(Avg_Comfort, Seat_Type, fill=Avg_Comfort)) +
scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
geom_bar(stat="identity")
print(seat_type_comfort_plot)Chapter 5: Airlines by Service
5.1. Airlines by Ground Service
ground <- airlines %>%
select(Airline_Name, Ground_Service) %>%
group_by(Airline_Name) %>%
summarise(Avg_Ground_Service = mean(Ground_Service, na.rm = T), Total_Customers = n()) %>%
filter(Total_Customers >= 50) %>%
arrange(desc(Avg_Ground_Service)) %>%
head(10)
print(ground)## # A tibble: 10 × 3
## Airline_Name Avg_Ground_Service Total_Customers
## <fct> <dbl> <int>
## 1 China Southern Airlines 4.87 100
## 2 Hainan Airlines 4.31 100
## 3 Rex Airlines 4.21 51
## 4 ANA All Nippon Airways 4.19 100
## 5 Garuda Indonesia 4.19 100
## 6 Regional Express 3.98 87
## 7 BA CityFlyer 3.97 72
## 8 Japan Airlines 3.90 100
## 9 Cathay Dragon 3.85 62
## 10 Azerbaijan Airlines 3.84 68
5.1.1. Plot
ground_plot <- ground %>%
mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Ground_Service)) %>%
ggplot(aes(Avg_Ground_Service, Airline_Name, fill=Total_Customers)) +
scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
geom_bar(stat="identity")
print(ground_plot)5.2. Airlines by Cabin Service
cabin <- airlines %>%
select(Airline_Name, Cabin_Service) %>%
group_by(Airline_Name) %>%
summarise(Avg_Cabin_Service = mean(Cabin_Service, na.rm = T), Total_Customers = n()) %>%
filter(Total_Customers >= 50) %>%
arrange(desc(Avg_Cabin_Service)) %>%
head(10)
print(cabin)## # A tibble: 10 × 3
## Airline_Name Avg_Cabin_Service Total_Customers
## <fct> <dbl> <int>
## 1 Hainan Airlines 4.79 100
## 2 China Southern Airlines 4.74 100
## 3 Rex Airlines 4.65 51
## 4 ANA All Nippon Airways 4.60 100
## 5 BA CityFlyer 4.37 72
## 6 Garuda Indonesia 4.26 100
## 7 Air Astana 4.25 100
## 8 Japan Airlines 4.25 100
## 9 Thai Smile Airways 4.22 100
## 10 Regional Express 4.14 87
5.2.1. Plot
cabin_plot <- cabin %>%
mutate(Airline_Name = fct_reorder(Airline_Name, Avg_Cabin_Service)) %>%
ggplot(aes(Avg_Cabin_Service, Airline_Name, fill=Total_Customers)) +
scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
geom_bar(stat="identity")
print(cabin_plot)5.3. Ground vs. Cabin Service Comparison of Top 10 Airlines
service_plot_no_legend <- ground_plot + theme(legend.position = "none")
vs_text <- textGrob("Vs.", gp = gpar(fontsize = 16, fontface = "bold"))
grid.arrange(service_plot_no_legend, vs_text, cabin_plot, ncol = 3, widths = c(4, 1, 4))Chapter 6: Additional Features of Airlines
6.1. Top Airlines by Food Quality, Wifi & Entertainment
add_ons <- airlines %>% select(Airline_Name, Wifi, Food_Beverages, Entertainment) %>%
group_by(Airline_Name) %>%
summarise(Avg_Food_Rating = mean(Food_Beverages, na.rm = T), Avg_Wifi = mean(Wifi, na.rm = T), Avg_Entertainment = mean(Entertainment, na.rm = T), Total_Customers = n()) %>%
filter(Total_Customers >=50)
print(add_ons)## # A tibble: 211 × 5
## Airline_Name Avg_Food_Rating Avg_Wifi Avg_Entertainment Total_Customers
## <fct> <dbl> <dbl> <dbl> <int>
## 1 Adria Airways 2.63 1.9 1.72 91
## 2 Aegean Airlines 2.82 2.67 2.52 100
## 3 Aer Lingus 1.91 1.62 2.13 100
## 4 Aeroflot Russian … 3.03 2.44 3.02 100
## 5 Aerolineas Argent… 2.49 1.79 2.24 100
## 6 Aeromexico 1.64 1.34 1.83 100
## 7 Air Arabia 1.96 1.68 1.49 100
## 8 Air Astana 4.09 3.38 3.86 100
## 9 Air Berlin 2.65 2.31 2.69 100
## 10 Air Canada 1.94 1.52 2.32 100
## # ℹ 201 more rows
6.1.1. Plot
add_ons_plot <- add_ons %>%
ggplot(aes(Avg_Wifi, Avg_Food_Rating, fill=Avg_Entertainment)) +
scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
geom_bar(stat="identity")
print(add_ons_plot)## Warning: Removed 3 rows containing missing values (`position_stack()`).
Chapter 7: Mostly Used Aircraft
7.1. Top 20 Aircraft by Number of Responses
count_aircraft <- airlines %>%
filter(Aircraft != "Unknown") %>%
group_by(Aircraft) %>%
summarise(Total_Aircraft = n()) %>%
arrange(desc(Total_Aircraft)) %>%
head(20)
print(count_aircraft)## # A tibble: 20 × 2
## Aircraft Total_Aircraft
## <fct> <int>
## 1 A320 1041
## 2 Boeing 737-800 553
## 3 Boeing 737 404
## 4 A330 349
## 5 Boeing 787 349
## 6 A321 271
## 7 A319 233
## 8 Boeing 787-9 174
## 9 Boeing 777 160
## 10 A330-300 142
## 11 A320-200 123
## 12 A350 122
## 13 A330-200 102
## 14 Boeing 777-300ER 101
## 15 Boeing 777-300 92
## 16 A350-900 84
## 17 A380 81
## 18 Boeing 787-8 77
## 19 A340 72
## 20 Boeing 767 60
7.1.1. Plot
count_aircraft_plot <- count_aircraft %>% mutate(Aircraft = fct_reorder(Aircraft, Total_Aircraft)) %>%
ggplot(aes(x=Total_Aircraft, y=Aircraft, fill=Total_Aircraft)) +
scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
geom_bar(stat="identity")
print(count_aircraft_plot)Chapter 8: Top Routes
8.1. Top 10 Routes
route_most_visit <- airlines %>% select(Route) %>%
filter(Route != "Unknown") %>%
group_by(Route) %>%
summarise(Trip_Count = n()) %>%
arrange(desc(Trip_Count)) %>%
head(10)
print(route_most_visit)## # A tibble: 10 × 2
## Route Trip_Count
## <fct> <int>
## 1 Melbourne to Sydney 43
## 2 Sydney to Melbourne 35
## 3 Cape Town to Johannesburg 34
## 4 Cusco to Lima 30
## 5 Bangkok to Phuket 28
## 6 Johannesburg to Cape Town 27
## 7 Kuala Lumpur to Singapore 27
## 8 Bangkok to Chiang Mai 26
## 9 Johannesburg to Durban 22
## 10 Toronto to Calgary 21
8.1.1. Plot
route_most_visit %>% mutate(Route = fct_reorder(Route, Trip_Count)) %>%
ggplot(aes(x=Trip_Count, y=Route, fill=Trip_Count)) +
scale_fill_gradient(low = "#5659a8", high = "#090d5b") +
geom_bar(stat="identity")print(route_most_visit)## # A tibble: 10 × 2
## Route Trip_Count
## <fct> <int>
## 1 Melbourne to Sydney 43
## 2 Sydney to Melbourne 35
## 3 Cape Town to Johannesburg 34
## 4 Cusco to Lima 30
## 5 Bangkok to Phuket 28
## 6 Johannesburg to Cape Town 27
## 7 Kuala Lumpur to Singapore 27
## 8 Bangkok to Chiang Mai 26
## 9 Johannesburg to Durban 22
## 10 Toronto to Calgary 21
Chapter 9: Trips over Years
9.1. Trips over the Years
busy_dates <- airlines %>% select(Date_Flown) %>%
filter(Date_Flown != "Not Available") %>%
group_by(Date_Flown) %>%
summarise(Total_Flights=n())
print(busy_dates)## # A tibble: 110 × 2
## Date_Flown Total_Flights
## <fct> <int>
## 1 "" 3754
## 2 "April 2012" 1
## 3 "April 2015" 9
## 4 "April 2016" 82
## 5 "April 2017" 102
## 6 "April 2018" 103
## 7 "April 2019" 251
## 8 "April 2020" 92
## 9 "April 2021" 57
## 10 "April 2022" 205
## # ℹ 100 more rows
9.2. Creating the Time Series
busy_dates$Date_Flown <- my(busy_dates$Date_Flown)9.3. Time Series Visualization
busy_dates_plot <- busy_dates %>% ggplot(aes(x=Date_Flown, y=Total_Flights)) +
geom_point() +
geom_line()
print(busy_dates_plot)## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).
9.4. Prophet Forecasting for the Next 3 Years
Chapter 10: Which Airlines are Value-for-Money & Recommended?
10.1. Airlines having 50+ Flights, Atleast 3 Value-for-Money Rating & Over 70% Recommendations
worth_it <- airlines %>% select(Airline_Name, Value_for_Money, Recommended) %>%
group_by(Airline_Name) %>%
summarise(Value = mean(Value_for_Money, na.rm = T), Recommendation = sum(Recommended=="yes")/n()*100, Total_Flights=n()) %>%
filter(Total_Flights>=50 & Value>=3 & Recommendation >=70) %>%
arrange(desc(Value), desc(Recommendation))
print(worth_it)## # A tibble: 15 × 4
## Airline_Name Value Recommendation Total_Flights
## <fct> <dbl> <dbl> <int>
## 1 China Southern Airlines 4.54 98 100
## 2 Hainan Airlines 4.54 87 100
## 3 ANA All Nippon Airways 4.11 78 100
## 4 Rex Airlines 3.98 76.5 51
## 5 BA CityFlyer 3.97 86.1 72
## 6 Garuda Indonesia 3.88 73 100
## 7 Thai Smile Airways 3.82 70 100
## 8 Cathay Dragon 3.77 82.3 62
## 9 Citilink 3.68 75 60
## 10 Dragonair 3.66 72 100
## 11 Bangkok Airways 3.64 72 100
## 12 Air Astana 3.61 72 100
## 13 S7 Siberia Airlines 3.59 71.2 66
## 14 Lao Airlines 3.55 90.3 72
## 15 Olympic Air 3.37 71.0 93
worth_it_plot <- worth_it %>% ggplot(aes(x = Airline_Name)) +
geom_bar(aes(y = Value, fill = "Value"), stat = "identity", position = "dodge", size=0.2) +
geom_bar(aes(y = Recommendation, fill = "Recommendation"), stat = "identity", position = "dodge") ## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
print(worth_it_plot)Chapter 11: Text Analysis on Reviews
11.1. Subsetting Data for Text Mining
reviews <- airlines %>% select(Airline_Name, Review_Title, Review, Review_Date) 11.2. Word Cloud
corpus1 <- iconv(reviews$Review_Title, to="utf-8")
titles <- Corpus(VectorSource(corpus1)) # creating corpus of review titles
corpus2 <- iconv(reviews$Review, to="utf-8")
corpus2 <- iconv(reviews$Review, to = "UTF-8", sub = "byte")
comment <- Corpus(VectorSource(corpus2)) # creating corpus of reviews11.2.1 Cleaning the Comments
comment <- tm_map(comment, tolower)## Warning in tm_map.SimpleCorpus(comment, tolower): transformation drops
## documents
comment <- tm_map(comment, removePunctuation)## Warning in tm_map.SimpleCorpus(comment, removePunctuation): transformation
## drops documents
comment <- tm_map(comment, removeNumbers)## Warning in tm_map.SimpleCorpus(comment, removeNumbers): transformation drops
## documents
comment <- tm_map(comment, removeWords, stopwords("english"))## Warning in tm_map.SimpleCorpus(comment, removeWords, stopwords("english")):
## transformation drops documents
comment <- tm_map(comment, stripWhitespace)## Warning in tm_map.SimpleCorpus(comment, stripWhitespace): transformation drops
## documents
11.2.2. Creating the Term Document Matrix
#dtm <- TermDocumentMatrix(comment)
#dtm <- as.matrix(dtm)11.2.3. Creating the Word Frequency Table
#word_freq <- rowSums(dtm)
#word_freq <- subset(word_freq, word_freq>1000) # subsetting words having frequency more than 50
#length(word_freq)11.2.4. Word Frequency Bar Plot
#barplot(word_freq, las = 0, col = "blue", main = "Word Frequency Barplot",
# xlab = "Words", ylab = "Frequency", cex.names = 0.7)11.2.5. Word Cloud
#set.seed(1234)
# wordcloud(words = names(word_freq), freq = word_freq, scale = c(3, 1.5),
# random.order = FALSE,
# colors = brewer.pal(8, "Dark2"),
# rot.per = 0.3,
# max.words = 50)