Assignment 6

Author

Colleen Malloy

Skytrax Data Scraping: Aer Lingus Airlines

Do things like seat type or reason for travel influence a passenger’s overall review or rating of an airline?

I love to travel and my family usually sticks to the same airlines when we fly because they are accountable, affordable, and accommodating to our family’s size and needs. I wonder if passengers review an airline poorly due to aspects of their experience, like seat type or reason for flying. For example, do business class flyers review the airline higher than an economy class flyer?

I chose to scrape Aer Lingus because this is the airline my family flys with when we travel to Europe.

Data

The data was scraped from the Skytrax website under the Aer Lingus airline page. There are a total of 1021 reviews. I scraped 110 of them for this project. I scraped the numerical review (out of 10), the reviewer’s name, reason for travel, seat type, and route. I created two new columns to separate the flights with layovers or stops and direct flights.

With this data I compared reviews and aspects of the flight/travel to find if those aspects influence the passenger’s review of the airline.

You can add options to executable code like this

This visual displays the type of travel by seat type. There are four travel types: business, couple leisure, family leisure, and solo leisure and three seat types: business class, premium economy, and economy class. Before creating this visual, I would assume that most family leisure is in economy class because it is typically the cheapest, and most business and solo travel would be in business or premium economy classes. This visual displays family leisure having mostly economy class passengers, with some in business class. It makes sense that more couple leisure are in business class than family leisure because that is only two people traveling. I am surprised to see that solo leisure is all economy class. I would have assumed that some people might buy a nicer ticket if they were traveling alone.

This is the distribution of the overall reviews of Aer Lingus airlines by each reviewer. I am not too surprised at this. On Skytrax website, the overall rating of Aer Lingus is 5/10. At first, I was shocked because personally, I have had great experiences with Aer Lingus. However, if I were to review anything, I would most likely have a really strong opinion one way or another. So, my assumption is that most reviews are at the very low end or high end of the scale from 1 to 10. With this idea, I was a little shocked at this visual, but if I had strong feelings against this airline, I would most likely write a review and rate them lower, so people would not have the same experience as me.

This could also be that I scraped 110 reviews out of 1021. So, there is a chance this distribution looks different all the reviews for Aer Lingus.

Here, I wanted to see how the business and premium economy class passengers reviewed the airline. Most of the business class passengers reviewed Aer Lingus as a 1/10. I could look into this more and read each individual review to find out if there is a common theme. I am assuming these passengers were expecting things because they were in business class but either did not receive them, or in their opinion did not receive service fast enough.

`summarise()` has grouped output by 'via_stops'. You can override using the
`.groups` argument.
# A tibble: 11 × 3
# Groups:   via_stops [5]
   via_stops        travel_type    count
   <chr>            <chr>          <int>
 1 Alicante airport Couple Leisure     1
 2 Chicago          Solo Leisure       1
 3 Cork             Couple Leisure     1
 4 Dublin           Business           4
 5 Dublin           Couple Leisure    10
 6 Dublin           Family Leisure    13
 7 Dublin           Solo Leisure       5
 8 <NA>             Business           9
 9 <NA>             Couple Leisure    23
10 <NA>             Family Leisure    22
11 <NA>             Solo Leisure      21

Create new columns to separate the flights with layovers or stops and direct flights. “via_stops” column is the place of the stop/layover.

Using these new columns, I wanted to see how many flights had stops and how many didn’t. There are four places the planes stopped at: Alicante airport, Chicago, Cork, and Dublin. It makes sense that Dublin is the majority of these places because that is Aer Lingus’ hub airport. I assumed that most flights would not have a stop, so I am not surprised at how many flights are in the NA column. I wanted to see if the distribution of reviews could be explained by this visual, but there are a lot of direct flights, so I would say they are not really related.

This takes the previous visual and finds number of each type of travel within each column (direct or layover flight). There is no pattern here. The proportions of flights with layovers and direct flights by each travel type seem equivalent. This solidifies that the route of the flight does not have as much affect on the passenger’s rating of the airline as other factors.