In this project, I am working as a blogger analyst to recommend which Invistico Airlines option offers better and more efficient service, based on customer satisfaction across different age groups, as well as external factors. I have 5 million followers, and while this collaboration is sponsored by the company, my priority is to recommend a high-quality service to my audience. To provide some background, Invistico Airlines is a Singapore-based airline with a wide range of national and international flights, serving 110 destinations worldwide. I selected this data due to its detailed insights and relevance to the topic, and I aim to contribute meaningful analysis through the use of R to determine which option stands out as the best. If you would like to explore the source of the data in more detail, please click on the provided link. https://www.kaggle.com/datasets/sjleshrac/airlines-customer-satisfaction?resource=download

1. Read in data from a .csv file. In this step, I am simply uploading the file into RStudio to begin analyzing the data using this program.

library (tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
invistico <- read.csv(file="Invistico_Airline.csv")
glimpse(invistico)
## Rows: 129,880
## Columns: 23
## $ satisfaction                      <chr> "satisfied", "satisfied", "satisfied…
## $ Gender                            <chr> "Female", "Male", "Female", "Female"…
## $ Customer.Type                     <chr> "Loyal Customer", "Loyal Customer", …
## $ Age                               <int> 65, 47, 15, 60, 70, 30, 66, 10, 56, …
## $ Type.of.Travel                    <chr> "Personal Travel", "Personal Travel"…
## $ Class                             <chr> "Eco", "Business", "Eco", "Eco", "Ec…
## $ Flight.Distance                   <int> 265, 2464, 2138, 623, 354, 1894, 227…
## $ Seat.comfort                      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Departure.Arrival.time.convenient <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Food.and.drink                    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Gate.location                     <int> 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, …
## $ Inflight.wifi.service             <int> 2, 0, 2, 3, 4, 2, 2, 2, 5, 2, 3, 2, …
## $ Inflight.entertainment            <int> 4, 2, 0, 4, 3, 0, 5, 0, 3, 0, 3, 0, …
## $ Online.support                    <int> 2, 2, 2, 3, 4, 2, 5, 2, 5, 2, 3, 2, …
## $ Ease.of.Online.booking            <int> 3, 3, 2, 1, 2, 2, 5, 2, 4, 2, 3, 2, …
## $ On.board.service                  <int> 3, 4, 3, 1, 2, 5, 5, 3, 4, 2, 3, 3, …
## $ Leg.room.service                  <int> 0, 4, 3, 0, 0, 4, 0, 3, 0, 4, 0, 2, …
## $ Baggage.handling                  <int> 3, 4, 4, 1, 2, 5, 5, 4, 1, 5, 1, 5, …
## $ Checkin.service                   <int> 5, 2, 4, 4, 4, 5, 5, 5, 5, 3, 2, 2, …
## $ Cleanliness                       <int> 3, 3, 4, 1, 2, 4, 5, 4, 4, 4, 3, 5, …
## $ Online.boarding                   <int> 2, 2, 2, 3, 5, 2, 3, 2, 4, 2, 5, 2, …
## $ Departure.Delay.in.Minutes        <int> 0, 310, 0, 0, 0, 0, 17, 0, 0, 30, 47…
## $ Arrival.Delay.in.Minutes          <int> 0, 305, 0, 0, 0, 0, 15, 0, 0, 26, 48…

#2. Illustrate the use of summary on a data frame.

In this step, I am generating a summary of the dataset. This allows me to analyze each column for key statistics such as minimum, maximum, average values, and more. These insights help me understand how the airline is performing. For example, the average departure delay is 14.71 minutes, but the maximum delay recorded is 1,592 minutes equivalent to approximately one day and two hours. This extreme case needs further analysis to understand how often such delays occur and how they affect customer satisfaction. Additionally, I can assess how frequently passengers use services like in-flight Wi-Fi and entertainment. With this information, I can make data-driven recommendations about the quality of these services and why customers might prefer this airline over others, along with other relevant observations.

summary(invistico)
##  satisfaction          Gender          Customer.Type           Age       
##  Length:129880      Length:129880      Length:129880      Min.   : 7.00  
##  Class :character   Class :character   Class :character   1st Qu.:27.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :40.00  
##                                                           Mean   :39.43  
##                                                           3rd Qu.:51.00  
##                                                           Max.   :85.00  
##                                                                          
##  Type.of.Travel        Class           Flight.Distance  Seat.comfort  
##  Length:129880      Length:129880      Min.   :  50    Min.   :0.000  
##  Class :character   Class :character   1st Qu.:1359    1st Qu.:2.000  
##  Mode  :character   Mode  :character   Median :1925    Median :3.000  
##                                        Mean   :1981    Mean   :2.839  
##                                        3rd Qu.:2544    3rd Qu.:4.000  
##                                        Max.   :6951    Max.   :5.000  
##                                                                       
##  Departure.Arrival.time.convenient Food.and.drink  Gate.location 
##  Min.   :0.000                     Min.   :0.000   Min.   :0.00  
##  1st Qu.:2.000                     1st Qu.:2.000   1st Qu.:2.00  
##  Median :3.000                     Median :3.000   Median :3.00  
##  Mean   :2.991                     Mean   :2.852   Mean   :2.99  
##  3rd Qu.:4.000                     3rd Qu.:4.000   3rd Qu.:4.00  
##  Max.   :5.000                     Max.   :5.000   Max.   :5.00  
##                                                                  
##  Inflight.wifi.service Inflight.entertainment Online.support
##  Min.   :0.000         Min.   :0.000          Min.   :0.00  
##  1st Qu.:2.000         1st Qu.:2.000          1st Qu.:3.00  
##  Median :3.000         Median :4.000          Median :4.00  
##  Mean   :3.249         Mean   :3.383          Mean   :3.52  
##  3rd Qu.:4.000         3rd Qu.:4.000          3rd Qu.:5.00  
##  Max.   :5.000         Max.   :5.000          Max.   :5.00  
##                                                             
##  Ease.of.Online.booking On.board.service Leg.room.service Baggage.handling
##  Min.   :0.000          Min.   :0.000    Min.   :0.000    Min.   :1.000   
##  1st Qu.:2.000          1st Qu.:3.000    1st Qu.:2.000    1st Qu.:3.000   
##  Median :4.000          Median :4.000    Median :4.000    Median :4.000   
##  Mean   :3.472          Mean   :3.465    Mean   :3.486    Mean   :3.696   
##  3rd Qu.:5.000          3rd Qu.:4.000    3rd Qu.:5.000    3rd Qu.:5.000   
##  Max.   :5.000          Max.   :5.000    Max.   :5.000    Max.   :5.000   
##                                                                           
##  Checkin.service  Cleanliness    Online.boarding Departure.Delay.in.Minutes
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :   0.00           
##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:   0.00           
##  Median :3.000   Median :4.000   Median :4.000   Median :   0.00           
##  Mean   :3.341   Mean   :3.706   Mean   :3.353   Mean   :  14.71           
##  3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:  12.00           
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :1592.00           
##                                                                            
##  Arrival.Delay.in.Minutes
##  Min.   :   0.00         
##  1st Qu.:   0.00         
##  Median :   0.00         
##  Mean   :  15.09         
##  3rd Qu.:  13.00         
##  Max.   :1584.00         
##  NA's   :393

3. Illustrate the use of table on an attribute of a data frame.

At this stage, I focused on comparing the different travel classes offered by Investico Airlines and analyzing customer satisfaction levels for each. The classification reveals that the most frequently used class is Business, with approximately 60% of customers reporting a positive experience. Based on this, I can confidently recommend the Business class to my followers, especially those who frequently travel for work and may benefit from potential discounts and enhanced services. On the other hand, the Eco and Eco Plus classes received more negative reviews than positive ones. These services do not meet my personal standards for quality and customer satisfaction. Therefore, in line with my values, I cannot recommend these options. Currently, Business class is the only service that meets the expectations I uphold for my audience. Furthermore, I requested an additional table to compare the number of satisfied and dissatisfied clients. What concerns me is that the number of satisfied customers is not significantly higher than the number of dissatisfied ones, they are quite similar. This is something that raises a red flag, as it suggests that the overall customer experience may not be consistently positive.

table(invistico$Class, invistico$satisfaction)
##           
##            dissatisfied satisfied
##   Business        18065     44095
##   Eco             35336     22973
##   Eco Plus         5392      4019
table(invistico$satisfaction)
## 
## dissatisfied    satisfied 
##        58793        71087

4. Output all the column names in a data frame.

In this step, I am reviewing the column names to identify the types of information contained in the Excel file. This helps me begin planning which columns are most relevant for gaining a deeper understanding of the data and making informed decisions about whether to recommend this airline. By focusing on key variables, I can determine if the service truly benefits customers and whether it is worth recommending to my audience.

colnames(invistico)
##  [1] "satisfaction"                      "Gender"                           
##  [3] "Customer.Type"                     "Age"                              
##  [5] "Type.of.Travel"                    "Class"                            
##  [7] "Flight.Distance"                   "Seat.comfort"                     
##  [9] "Departure.Arrival.time.convenient" "Food.and.drink"                   
## [11] "Gate.location"                     "Inflight.wifi.service"            
## [13] "Inflight.entertainment"            "Online.support"                   
## [15] "Ease.of.Online.booking"            "On.board.service"                 
## [17] "Leg.room.service"                  "Baggage.handling"                 
## [19] "Checkin.service"                   "Cleanliness"                      
## [21] "Online.boarding"                   "Departure.Delay.in.Minutes"       
## [23] "Arrival.Delay.in.Minutes"

5. Output the min, max, average, and standard deviation of a variable from a data frame.

In this step, I recovered a lot of information using tools such as max, min, average, and standard deviation. Additionally, I applied filters, handled missing data, and generated summary tables to better understand the dataset. First, I observed that some columns, such as Arrival Delay in Minutes contained missing values. Instead of checking each column individually for missing data, I used the na.rm = TRUE parameter to eliminate them and obtain accurate results. Since functions like mean and max only return numerical outputs, I added filters to retrieve more context, such as related columns like satisfaction and flight class. From the analysis, I discovered that the maximum arrival delay was 1584 minutes, and unsurprisingly, the customer was dissatisfied. The flight was in Economy class. What stood out to me, however, was that the minimum delay was zero, indicating that several flights arrived exactly on time not early, but not late either. Since multiple flights had this condition, I limited the output to the first five rows. Interestingly, all five passengers were marked as satisfied, suggesting that punctuality plays a significant role in customer satisfaction. Wanting to dive deeper into service quality, I created a table to calculate the mean values of various rated services, where scores range from 1 to 5 (with 5 being the best). In this table, Cleanliness had the highest average rating of 3.7, closely followed by Online Support with 3.5. Check-in Service had the lowest score among the selected features, although still above 2.5, indicating that more than half of the customers were at least somewhat satisfied. Lastly, I calculated the standard deviation for Inflight Wi-Fi Service and Inflight Entertainment to understand the variability in passenger opinions. Both features showed standard deviations above 1, which suggests that there were mixed experiences among passengers, some were highly satisfied, while others were not. In conclusion, this analysis allowed me to uncover key insights about customer satisfaction, punctuality, and service performance, and provided a better understanding of which areas the airline excels in and which ones may need improvement.

anyNA(invistico$Arrival.Delay.in.Minutes)
## [1] TRUE
invistico %>%
  filter(Arrival.Delay.in.Minutes == max(Arrival.Delay.in.Minutes, na.rm = TRUE)) %>%
  select(satisfaction, Class,Arrival.Delay.in.Minutes)
##   satisfaction Class Arrival.Delay.in.Minutes
## 1 dissatisfied   Eco                     1584
invistico %>%
  filter(Arrival.Delay.in.Minutes == min(Arrival.Delay.in.Minutes, na.rm = TRUE)) %>%
  select(satisfaction, Class,Arrival.Delay.in.Minutes) %>%
  head(5)
##   satisfaction Class Arrival.Delay.in.Minutes
## 1    satisfied   Eco                        0
## 2    satisfied   Eco                        0
## 3    satisfied   Eco                        0
## 4    satisfied   Eco                        0
## 5    satisfied   Eco                        0
means_table <- sapply(invistico[, c("Online.support", 
                                    "Ease.of.Online.booking", 
                                    "Checkin.service", 
                                    "Cleanliness")], 
                      mean, na.rm = TRUE)
print(means_table)
##         Online.support Ease.of.Online.booking        Checkin.service 
##               3.519703               3.472105               3.340807 
##            Cleanliness 
##               3.705759
sd(invistico$Inflight.wifi.service, na.rm = TRUE)
## [1] 1.318818
sd(invistico$Inflight.entertainment, na.rm = TRUE) 
## [1] 1.346059

6. Illustrate how you can select columns of a data frame into a new data frame. Show your result by executing a summary or glimpse of the new data frame.

At this point, I first created a new column called loyalty score to assess how satisfied the clients were. Then, using the glimpse tool, I generated a new data frame with specific columns that I want to examine in more detail.

invistico$loyalty_score <- (invistico$Online.boarding + invistico$Checkin.service + invistico$Ease.of.Online.booking) / 3

newinvistico <- select(invistico, Class, Type.of.Travel, satisfaction, Flight.Distance, Customer.Type, loyalty_score)

glimpse(newinvistico)
## Rows: 129,880
## Columns: 6
## $ Class           <chr> "Eco", "Business", "Eco", "Eco", "Eco", "Eco", "Eco", …
## $ Type.of.Travel  <chr> "Personal Travel", "Personal Travel", "Personal Travel…
## $ satisfaction    <chr> "satisfied", "satisfied", "satisfied", "satisfied", "s…
## $ Flight.Distance <int> 265, 2464, 2138, 623, 354, 1894, 227, 1812, 73, 1556, …
## $ Customer.Type   <chr> "Loyal Customer", "Loyal Customer", "Loyal Customer", …
## $ loyalty_score   <dbl> 3.333333, 2.333333, 2.666667, 2.666667, 3.666667, 3.00…

7. In cleaning data, often you may wish to rename a column in a data frame. Illustrate renaming a column.

In this step, I focused on renaming a column that originally had a complicated name due to how it was formatted in the Excel file. The original name included special characters and spaces, which made it difficult to reference in R. To simplify my workflow and improve readability, I renamed the column to time_convenience. This change allowed me to work with the column more easily in data transformations and visualizations. To confirm that the renaming was successful and that the column was properly integrated into the dataset, I used the glimpse() function. This provided a structured overview of the dataset and confirmed that time_convenience now appears as an accessible and clean column name.

invistico <- invistico %>%
  rename(time_convenience = Departure.Arrival.time.convenient)
glimpse(invistico)
## Rows: 129,880
## Columns: 24
## $ satisfaction               <chr> "satisfied", "satisfied", "satisfied", "sat…
## $ Gender                     <chr> "Female", "Male", "Female", "Female", "Fema…
## $ Customer.Type              <chr> "Loyal Customer", "Loyal Customer", "Loyal …
## $ Age                        <int> 65, 47, 15, 60, 70, 30, 66, 10, 56, 22, 58,…
## $ Type.of.Travel             <chr> "Personal Travel", "Personal Travel", "Pers…
## $ Class                      <chr> "Eco", "Business", "Eco", "Eco", "Eco", "Ec…
## $ Flight.Distance            <int> 265, 2464, 2138, 623, 354, 1894, 227, 1812,…
## $ Seat.comfort               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ time_convenience           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1…
## $ Food.and.drink             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Gate.location              <int> 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 1, 1…
## $ Inflight.wifi.service      <int> 2, 0, 2, 3, 4, 2, 2, 2, 5, 2, 3, 2, 5, 4, 5…
## $ Inflight.entertainment     <int> 4, 2, 0, 4, 3, 0, 5, 0, 3, 0, 3, 0, 0, 0, 2…
## $ Online.support             <int> 2, 2, 2, 3, 4, 2, 5, 2, 5, 2, 3, 2, 5, 4, 1…
## $ Ease.of.Online.booking     <int> 3, 3, 2, 1, 2, 2, 5, 2, 4, 2, 3, 2, 5, 4, 5…
## $ On.board.service           <int> 3, 4, 3, 1, 2, 5, 5, 3, 4, 2, 3, 3, 1, 3, 5…
## $ Leg.room.service           <int> 0, 4, 3, 0, 0, 4, 0, 3, 0, 4, 0, 2, 3, 5, 0…
## $ Baggage.handling           <int> 3, 4, 4, 1, 2, 5, 5, 4, 1, 5, 1, 5, 2, 2, 5…
## $ Checkin.service            <int> 5, 2, 4, 4, 4, 5, 5, 5, 5, 3, 2, 2, 2, 3, 2…
## $ Cleanliness                <int> 3, 3, 4, 1, 2, 4, 5, 4, 4, 4, 3, 5, 4, 2, 5…
## $ Online.boarding            <int> 2, 2, 2, 3, 5, 2, 3, 2, 4, 2, 5, 2, 5, 4, 2…
## $ Departure.Delay.in.Minutes <int> 0, 310, 0, 0, 0, 0, 17, 0, 0, 30, 47, 0, 0,…
## $ Arrival.Delay.in.Minutes   <int> 0, 305, 0, 0, 0, 0, 15, 0, 0, 26, 48, 0, 0,…
## $ loyalty_score              <dbl> 3.333333, 2.333333, 2.666667, 2.666667, 3.6…

8. Illustrate the use of filter (from tidyverse) on a data frame.

At this point, I would first like to mention that the tidyverse library was loaded at the beginning of the script. Here, I used the filter function to analyze satisfaction levels based on gender and type of travel, focusing specifically on passengers who traveled in business class and reported being satisfied with the service. I chose to focus on this group because, as shown in Step 3, Table 1, business class was the only category where the number of satisfied customers exceeded the number of dissatisfied ones. The results showed that business class travel was used slightly more by females than males. Therefore, I would recommend this class especially for women traveling for business or those seeking greater comfort during personal trips. However, it is important to note that the difference between genders was not very large, so it remains a suitable and appealing option for both women and men.

invisticofilterdata<- filter(invistico, Class == "Business", satisfaction== "satisfied")
table(invisticofilterdata$Gender, invisticofilterdata$Type.of.Travel)
##         
##          Business travel Personal Travel
##   Female           21448            1081
##   Male             21405             161

9. Illustrate the use of arrange on a data frame. Use any column of your choosing, but sort from greatest to least.

In this step, I created a table using arrange() to sort the data by Flight.Distance in descending order. My goal was to examine the top fifteen longest flights. From the data, I observed that most of these long-distance flights are associated with loyal customers, with loyalty scores mostly ranging between 1, 3, and 4, score 3 being the most common. Additionally, eight out of the fifteen flights were in the Business class, which shows it is the most frequently used class for longer routes. However, satisfaction is split between “satisfied” and “dissatisfied” customers, even among those flying Business class. This inconsistency suggests that the company may not be providing a consistently high-quality service, especially for premium passengers. Therefore, while some indicators, such as loyalty and preference for Business class are positive, the mixed satisfaction levels raise concerns. Based on this analysis, I feel uncertain about recommending Invisco Airlines. It appears to be 50% positive and 50% negative. I would personally consider trying a flight in Business class to assess the service quality firsthand before confidently recommending the company to others. Improving customer service may help increase satisfaction and enhance loyalty.

sorted_newinvistico <- arrange(newinvistico, desc(Flight.Distance))
top15_newinvistico <- sorted_newinvistico[1:15, ]
print(top15_newinvistico)
##       Class  Type.of.Travel satisfaction Flight.Distance     Customer.Type
## 1  Business Business travel dissatisfied            6951 disloyal Customer
## 2  Business Business travel    satisfied            6950    Loyal Customer
## 3  Business Business travel    satisfied            6948    Loyal Customer
## 4       Eco Personal Travel dissatisfied            6924    Loyal Customer
## 5       Eco Personal Travel dissatisfied            6907    Loyal Customer
## 6  Business Business travel    satisfied            6907    Loyal Customer
## 7  Eco Plus Personal Travel dissatisfied            6889    Loyal Customer
## 8       Eco Personal Travel dissatisfied            6882    Loyal Customer
## 9  Business Business travel    satisfied            6868    Loyal Customer
## 10 Business Personal Travel dissatisfied            6865    Loyal Customer
## 11 Business Business travel dissatisfied            6837 disloyal Customer
## 12 Business Business travel    satisfied            6828    Loyal Customer
## 13      Eco Business travel    satisfied            6816    Loyal Customer
## 14      Eco Personal Travel dissatisfied            6813    Loyal Customer
## 15      Eco Personal Travel dissatisfied            6811    Loyal Customer
##    loyalty_score
## 1       1.666667
## 2       3.000000
## 3       1.666667
## 4       3.000000
## 5       3.333333
## 6       3.666667
## 7       3.333333
## 8       2.666667
## 9       4.000000
## 10      4.666667
## 11      3.000000
## 12      3.666667
## 13      3.666667
## 14      1.333333
## 15      3.000000

10. Use slice_max to output the top 4 rows of a data frame. Use any column of your choosing.

In the top 4 longest Business class flights, most were for business purposes, and most passengers were loyal customers. However, 2 out of 4 were dissatisfied, including the longest flight. This may indicate weaknesses in long-distance service quality, even for premium customers. Invisco Airlines could benefit from improving customer satisfaction on its longer Business class routes. Based on the analysis of the longest flights in Business class, it’s clear that while most passengers are loyal and travel for business purposes, satisfaction levels are inconsistent. Some customers report dissatisfaction despite flying in a premium class, which suggests that service quality is not meeting expectations. To improve the overall experience for Business class travelers, I recommend that Invisco Airlines focus on enhancing in-flight comfort, personalized service, and post-flight support. These high-paying customers expect a premium experience, especially on long-distance routes. By addressing areas of dissatisfaction—such as food quality, seat comfort, or staff attentiveness, the company can strengthen customer loyalty and justify the value of its Business class offering.

newinvistico %>%
  filter(Class == "Business") %>%
  slice_max(Flight.Distance, n = 4)
##      Class  Type.of.Travel satisfaction Flight.Distance     Customer.Type
## 1 Business Business travel dissatisfied            6951 disloyal Customer
## 2 Business Business travel    satisfied            6950    Loyal Customer
## 3 Business Business travel    satisfied            6948    Loyal Customer
## 4 Business Business travel    satisfied            6907    Loyal Customer
##   loyalty_score
## 1      1.666667
## 2      3.000000
## 3      1.666667
## 4      3.666667

11. Illustrate the use of a pipe operation (%>%)

Based on the analysis using mean and variance, I observed that the overall perception of business class customers tends to be either neutral or positive. The average indicates a generally favorable impression, but the variance reveals a moderate level of dispersion in the data. This means that, while there are satisfied clients, there are also some whose experiences were less positive. Given this information, I feel less confident in recommending the business class service for travel, and I would likely decide not to collaborate with the airline at this time. However, I would suggest that the company focus on improving the quality of its services and enhancing the overall passenger experience. Offering occasional discounts could also be beneficial, as well as implementing targeted programs for specific customer segments, such as women, who appear to be more frequent users of this service. These strategies could help increase customer satisfaction and loyalty over time.

newinvistico %>%
  filter(Class == "Business") %>%
  summarise(
    mean_loyalty = mean(loyalty_score, na.rm = TRUE),
    var_loyalty = var(loyalty_score, na.rm = TRUE)
  )
##   mean_loyalty var_loyalty
## 1      3.55665   0.8203276

12. Use ggplot to create 2 different visualizations of your data

First Plot: Class Count

In this graphic, I want to show how many people are choosing each class. Business is the most used, while Eco Plus is the least. Business and Economy have a similar number of passengers.

ggplot(newinvistico, aes(x = Class)) +
  geom_bar(fill = "lightblue") +
  labs(title = "Passenger Count by Class", x = "Class", y = "Count")

Second Plot: Satisfaction by Class

In this second graphic, I added the satisfaction results to determine which class I should recommend. As previously mentioned, Business is the only class with a higher number of satisfied passengers. The other classes show more dissatisfaction. Therefore, I would not recommend services that do not offer good customer experiences. I want my recommendations to be authentic, which is why I focused on the Business class. If I were to work with this company, it would have to be based on the quality of the Business class. Otherwise, I would not agree to support it.

ggplot(newinvistico, aes(x = Class, fill = satisfaction)) +
  geom_bar(position = "dodge") +
  labs(title = "Satisfaction by Class", x = "Class", y = "Count")

Third Plot: Loyalty Score by Class (Points)

In this graphic, I used a column I created in exercise 6 called “loyalty_score.” This column combines satisfaction scores based on online boarding, check-in service, and online booking. By using this data again, I aimed to extract more insights. What I found was that, once again, Business class has the highest customer loyalty. This reinforces my certainty that if I’m going to recommend a service, it will be Business class, or none at all.

newinvistico %>%
  group_by(Type.of.Travel) %>%
  summarise(loyalty_score = mean(loyalty_score, na.rm = TRUE)) %>%
  ggplot(aes(x = Type.of.Travel, y = loyalty_score, group = 1)) +
  geom_line(color = "steelblue", size = 1) +
  labs(
    title = "Average Loyalty Score by Type of Travel",
    x = "Type of Travel",
    y = "Loyalty Score"
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Fourth Plot: Line Plot by Type of Travel

After reviewing the loyalty scores by class, I find myself genuinely conflicted. The graph clearly shows that Eco Plus passengers tend to exhibit the highest loyalty, with many scores leaning toward the top and highlighted in warm tones like orange. This suggests a more positive experience overall. However, Business Class, which should ideally represent the most premium offering, shows a wide range of loyalty scores, including many lower values, something that raises red flags. Based on this, I can’t confidently recommend Business Class, at least not without seeing improvements in consistency or satisfaction. Yet, I’m also not entirely convinced by Eco or Eco Plus, especially considering the earlier analysis that showed variability and subtle issues across all classes. As a blogger who strives to be transparent and fair, I have to admit: neither class fully earns my trust yet. The data leaves me unsure,something is lacking, and I’d love to see more effort from the airline before making a solid recommendation to my readers.

ggplot(newinvistico, aes(x = Class, y = loyalty_score, fill = loyalty_score)) +
  geom_point(color = "black", pch = 25) +
  scale_fill_gradient(low = "blue", high = "orange") +
  labs(
    title = "Average Loyalty Score by Class",
    x = "Class",
    y = "Loyalty Score"
  )