Dinn Ri Data Visualisation Assignment

Author

Catherine Quirke

Installing, setting up packages, and setting the working directory.

The following libraries and packages have been installed in order to complete the requirement of the project. The library is set every time and the used many times and is used to identify the packages required.

The packages however are only installed once and kept in the Rscript. These packages allow for additional functionality, this includes but is not limited to the installing of the “memer” package allowing for the creation of the data analytics experience meme at the end of this file.

##Setting up packages and setting the working directory.##

devtools::install_github("sctyner/memer")
devtools::install_github("AndreaCirilloAC/paletter")

library(memer)
library(tidyverse)
library(gghighlight)
library(kableExtra)
library(devtools)
library(knitr)
library(ggplot2)
library(dplyr)

hotel <- read_csv("hotel_satisfaction.csv")

message = F
warning = F

Introduction

Hello, my name is Catherine Quirke, fourth year digital marketing and analytics student at SETU, Carlow campus. The data analysis for this project shall be done with the Carlow hotel Dinn Ri group in mind, it will use the colours found in the logo according to GitHub’s paletter, however the data set will be from hotel_satisfaction.csv.

The variables used can be broken down into one of the following 4 categories:

Satisfaction variables
Information regarding the customer’s stay
Customer’s spending habits
Customer segmentation.

Paletter

The use of packages becomes clear in this output of the the Dinn Ri colour palette compiled by “AndreaCirilloAC/paletter” , this provided the hexcodes derived from the Dinn Ri logo and will be used throughout this analysis as a result.

paletter::create_palette(image_path = "E:\\SETU\\Data Analytics\\EDA with R Assignment Files\\The_Dinn_Ri_Logo.jpg",
               number_of_colors =6,
               type_of_variable = "categorical")

decomposing image into RGB...

applying kmeans to the image...

optimising palette...

Joining with `by = join_by(id)`
Joining with `by = join_by(id)`
optimising level of divergence between colours

[1] "#615555" "#615E70" "#AC994D" "#696260" "#A8A382" "#9B8956"

Output 2: Bargraph

This graph was made using the ‘custID’ and ‘eliteSegment’ variables. This bar graph is organised based on the highest count of individuals in the various customer segments, as seen on the graph, the biggest customer segment does not actually belong to a elite category. 544 people have no status meanwhile the smallest elite segment is shown as platinum.

The choice to have the largest segment bold in black and labeled was done with the thought of making it quickly distinguishable to individuals without needing to look at the data.

Considering the variables consisted of a count numerical value and a categorical value, it seemed very fitting to use a bargraph is this instance.

##OutPut 2: Bargraph - This will showcase a count of the number of individuals in each elite segment of the Dinn Ri##

hotel %>%
  group_by(eliteSegment) %>%
  summarise(custID = n()) %>%
  ggplot() +
  geom_col(mapping = aes(x = custID, y = reorder(eliteSegment, custID), fill = eliteSegment), 
  show.legend = FALSE) +
  labs(title = "Number of Customers by Elite Segment",
  subtitle = "Most customer have no elite status.",
   x = "Number of Customers",
   y = NULL) +
   geom_text(aes(x = custID, y = reorder(eliteSegment, custID), 
   label = custID), hjust = 1.5, color = "#FFFFFF") +
   gghighlight(eliteSegment == "NoStatus") +
   scale_fill_manual(values = c("#000000"))

Output 3: Area Chart

This area chart was made using the variables “nightsStayed’, ”distanceTraveled, and ‘visitPurpose’ variables. This graph allows us to determine that the most popular reasons individuals have for staying at the hotel is for vocational purposes, we can also discover visitors have also traveled as far as 5000km for visiting a conference. the least popular reason for individuals. The data also tells us the consumers who travel between 1000 and 1500km tend to stay between and 20 and 60 days with an incremental increase. While there are some outliers to this data, it is evident there is a correlation between the distance traveled and the number of nights spent at the hotel.

The choice to do an area graph for this for this data was inspired by the ability to compare various visit purposes and how it impacts both the distance willing to travel and in turn the number of nights customers tend to stay based on the former factors.

##Output 3: Area Chart - This graph shall examine how many nights customers stayed in comparison to the distance they travelled, it will also identify the reason in which they stayed.##

hotel %>% 
  ggplot(aes (x = nightsStayed, y = distanceTraveled, group=visitPurpose, fill = visitPurpose )) +
  geom_area() +
  ggtitle("Time Spent in the Dinn Ri vs the Distance Travelled based on the Trip Purpose") +
  theme(legend.position="none",
        panel.spacing = unit(0.1, "lines"),
        strip.text.x = element_text(size = 8)) +
        facet_wrap(~visitPurpose, scale="free_y") +
        scale_fill_manual(values = c("#393645","#E0D9CC", "#CEBA72", "#69604D", "#4A9CA5", "#3d3234" )) +
        labs(x = "Nights Stayed at the Dinn Ri", y = "Distance Travelled to the Dinn Ri") +
        theme_minimal()

Output 4: Boxplot

This area chart was made using the variables “satOVerall and ‘eliteSegment’ variables. A box plot like the one shown here allows for easy comparison between the customer segments and their satisfaction on the overall experience at the hotel. For instance, we can see based on this data that the most satisfied customers are those in the platinum segment. This is the despite the highly satisfied outliers in the gold and those without a status segment.

##Output 4: Boxplot - Setting the building products for the data analysis, this will look at the overall satisfaction of Dinn Ri customers based on their elite segmentation or lack of.##

ggplot(data = hotel) +
      geom_boxplot(mapping = aes(x = eliteSegment, y = satOverall, fill = eliteSegment), 
      outlier.shape = 1, outlier.size = 2, 
      outlier.colour = "black") +
      scale_fill_manual(values = c("#DAA520", "#000000", "#E5E4E2", "#CAC9C4")) +
      labs(title = "Overall Satisfaction of the Dinn Ri Based on Customer Segment", 
      x = "Customer Elite Segment", 
      y = "OVerall Customer Satisfaction" )

Variable Creation

To better discover data insights regarding the satisfaction of all areas of the hotel, I decided to create a variable that combines an average of all the staff ratings as per customers, I then elected to use the this data for a scatter plot.

Output 5: Scatterplot

Using two numerical values such as the “satStaff” satisfaction and the “satOverall” it made sense to make a scatter plot, especially in attempt to identify any correlation between the staff in the hotel and the overall experience.

The data did not conclude show any corelation that the staff had as much impact as one might think for staff satisfaction, the custommers rated the staff highly even when they did not feel satisfied with their stay.

##This code will be used to make a new variable that combines all the staff within the hotel satisfaction dataset##

hotel <- mutate (hotel, satStaff = satFrontStaff + satDiningStaff + satHouseStaff + satValetStaff)

##Output 5: Scatterplot - Using newly created variable, "satStaff" will analyse how satisfaction with the staff correlates with the customer segmentation.##

ggplot(data = hotel) +
      geom_point(mapping = aes(x = satStaff, y = satOverall, colour = eliteSegment),
      shape = 8, size = 1.5) +
      scale_colour_manual(values = c("#DAA520", "#000000", "#CAC9C4", "#C0C0C0")) +
      facet_wrap (~eliteSegment, nrow = 2) +
      labs(title = "Correlation Between Staff Satisfaction and Overall Satisfaction ", 
      x = "Overall Satisfaction of the Dinn Ri Staff", 
      y = "Overall Satisfaction of the Dinn Ri" )

Output 6: Histogram

This histogram left skewed showing majority of the hotel guests did not make a great distance to for their visit, the data shows thr wide range of customers distance travelled from 0km to well over 6000km.

##Output 6: Histogram - This graph shall examines the distribution of distance traveled by customers.##
ggplot(data = hotel) +
  geom_histogram(mapping = aes(x = distanceTraveled), binwidth = 25) +
  labs(title = "Distance Travelled to the Dinn Ri, Carlow ", 
  x = "Distance in (KM)", 
  y = "Number of Individuals" )

Output 7: Table

This table compares the average spend on various components of the hotel such as; wifi, diniing, room, and parking then looks at the average rating compared.

##Output 7: Table - This table looks at the average ranking of customer's spending habits.##
hotel %>%
  group_by(eliteSegment) %>%
  summarise(
    avgSatRoomPrice = round(mean(satRoomPrice, na.rm = TRUE),0),
    avgDiningPrice = round(mean(satDiningPrice, na.rm = TRUE),0), 
    avgWifiRoomPrice = round(mean(satWifiPrice, na.rm = TRUE),0),
    avgParkingPrice = round(mean(satParkingPrice, na.rm = TRUE),0)) %>%
    setNames(c("Elite Segment", "Average Room Price", "Average Dining Price", "Average Wifi Price",          "Average Parking Price")) %>%
    kable()

Elite Segment	Average Room Price	Average Dining Price	Average Wifi Price	Average Parking Price
Gold	4	4	4	4
NoStatus	4	4	4	4
Platinum	4	4	5	5
Silver	4	4	4	4

##This code will be used to make a new variable that combines all the staff within the hotel satisfaction dataset##

hotel <- mutate (hotel, spendHabit = avgRoomSpendPerNight + avgFoodSpendPerNight + avgWifiSpendPerNight) 
hotel <- mutate (hotel, satSpend = satRoomPrice + satDiningPrice + satWifiPrice + satParkingPrice)

Output 8: Area Chart

Once again, the use of an area chart is used to showcase the spending of customers and how it compared to their satisfaction of the resources was. Looking specifically at the spending habits for those visiting for sports events, vacations, or other reasons appear to be the most satisfied with the resources that average at 200 euro,

hotel %>% 
  ggplot(aes (x = spendHabit, y = satSpend , group = visitPurpose, fill = visitPurpose )) +
  geom_area() +
  ggtitle("Time Spent in the Dinn Ri vs the Distance Travelled based on the Trip Purpose") +
  theme(legend.position="none",
        panel.spacing = unit(0.1, "lines"),
        strip.text.x = element_text(size = 8)) +
        facet_wrap(~visitPurpose, scale="free_y") +
        scale_fill_manual(values = c("#393645","#E0D9CC", "#CEBA72", "#69604D", "#4A9CA5", "#3d3234" )) +
        labs(x = "Spending Habits", y = "Satisfaction with money spent") +
        theme_minimal()

Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values
Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values
Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values
Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values

##Meme about Data Analytics over the years.##

meme_get("ExpandingBrain") %>% 
  meme_text_brain("Year 1 of DMA: Starting data analaytics tools", 
                  "Year 2 of DMA: Expanding knowlege of data analysis", 
                  "Year 3 of DMA: Learning SPSS + Tableau", 
                  "Year 4 of DMA: Making memes in R Studio", 
                   size = 16)