As a junior data analyst for a hotel booking company, you have been
creating visualizations in R with the ggplot2
package to share insights about your data with stakeholders. After
creating a series of visualizations using ggplot(),
ggplot2 aesthetics, and filters, your stakeholder asks you
to add annotations to your visualizations to help explain your findings
in a presentation. Luckily, ggplot2 has annotation
functions built in.
hotel_bookings <- read.csv("hotel_bookings.csv")
head(hotel_bookings)
## hotel is_canceled lead_time arrival_date_year arrival_date_month
## 1 Resort Hotel 0 342 2015 July
## 2 Resort Hotel 0 737 2015 July
## 3 Resort Hotel 0 7 2015 July
## 4 Resort Hotel 0 13 2015 July
## 5 Resort Hotel 0 14 2015 July
## 6 Resort Hotel 0 14 2015 July
## arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
## 1 27 1 0
## 2 27 1 0
## 3 27 1 0
## 4 27 1 0
## 5 27 1 0
## 6 27 1 0
## stays_in_week_nights adults children babies meal country market_segment
## 1 0 2 0 0 BB PRT Direct
## 2 0 2 0 0 BB PRT Direct
## 3 1 1 0 0 BB GBR Direct
## 4 1 1 0 0 BB GBR Corporate
## 5 2 2 0 0 BB GBR Online TA
## 6 2 2 0 0 BB GBR Online TA
## distribution_channel is_repeated_guest previous_cancellations
## 1 Direct 0 0
## 2 Direct 0 0
## 3 Direct 0 0
## 4 Corporate 0 0
## 5 TA/TO 0 0
## 6 TA/TO 0 0
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## 1 0 C C
## 2 0 C C
## 3 0 A C
## 4 0 A A
## 5 0 A A
## 6 0 A A
## booking_changes deposit_type agent company days_in_waiting_list customer_type
## 1 3 No Deposit NULL NULL 0 Transient
## 2 4 No Deposit NULL NULL 0 Transient
## 3 0 No Deposit NULL NULL 0 Transient
## 4 0 No Deposit 304 NULL 0 Transient
## 5 0 No Deposit 240 NULL 0 Transient
## 6 0 No Deposit 240 NULL 0 Transient
## adr required_car_parking_spaces total_of_special_requests reservation_status
## 1 0 0 0 Check-Out
## 2 0 0 0 Check-Out
## 3 75 0 0 Check-Out
## 4 75 0 0 Check-Out
## 5 98 0 1 Check-Out
## 6 98 0 1 Check-Out
## reservation_status_date
## 1 2015-07-01
## 2 2015-07-01
## 3 2015-07-02
## 4 2015-07-02
## 5 2015-07-03
## 6 2015-07-03
colnames(hotel_bookings)
## [1] "hotel" "is_canceled"
## [3] "lead_time" "arrival_date_year"
## [5] "arrival_date_month" "arrival_date_week_number"
## [7] "arrival_date_day_of_month" "stays_in_weekend_nights"
## [9] "stays_in_week_nights" "adults"
## [11] "children" "babies"
## [13] "meal" "country"
## [15] "market_segment" "distribution_channel"
## [17] "is_repeated_guest" "previous_cancellations"
## [19] "previous_bookings_not_canceled" "reserved_room_type"
## [21] "assigned_room_type" "booking_changes"
## [23] "deposit_type" "agent"
## [25] "company" "days_in_waiting_list"
## [27] "customer_type" "adr"
## [29] "required_car_parking_spaces" "total_of_special_requests"
## [31] "reservation_status" "reservation_status_date"
install.packages('tidyverse')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Your stakeholder tells you that they would like you to create a visualization that compares market segments between city hotels and resort hotels. This will help inform how the company targets promotions in the future. They ask you to create a cleaned and labeled version and save it as a .png file so they can include it in a presentation.
As a refresher, here a chart similar to what you created in a previous activity:
ggplot(data = hotel_bookings) +
geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel)
ggplot(data = hotel_bookings) +
geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel) +
labs(title="Comparison of market segments by hotel type for hotel bookings")
This code chunk will generate the same chart as before, but now it includes a title to explain the data visualization more clearly to your audience.
You also want to add another detail about what time period this data covers. To do this, you need to find out when the data is from.
You realize you can use the min() function on the year
column in the data:
min(hotel_bookings$arrival_date_year)
## [1] 2015
And the max() function:
max(hotel_bookings$arrival_date_year)
## [1] 2017
But you will need to save them as variables in order to easily use them in your labeling; the following code chunk creates two of those variables:
mindate <- min(hotel_bookings$arrival_date_year)
maxdate <- max(hotel_bookings$arrival_date_year)
Now, you will add in a subtitle using subtitle= in the
labs() function. Then, you can use the
paste0() function to use your newly-created variables in
your labels. This is really handy, because if the data gets updated and
there is more recent data added, you don’t have to change the code below
because the variables are dynamic:
ggplot(data = hotel_bookings) +
geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel) +
theme(axis.text.x = element_text(angle = 45)) +
labs(title="Comparison of market segments by hotel type for hotel bookings",
subtitle=paste0("Data from: ", mindate, " to ", maxdate))
This code chunk will add the subtitle ‘Data from: 2015 to 2017’ underneath the title you added earlier to the chart.
You realize that this chart is displaying the technical details a
little too prominently. You don’t want that to be the second thing
people notice during the presentation. You decide to switch the
subtitle to a caption which will appear in the
bottom right corner instead.
ggplot(data = hotel_bookings) +
geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel) +
theme(axis.text.x = element_text(angle = 45)) +
labs(title="Comparison of market segments by hotel type for hotel bookings",
caption=paste0("Data from: ", mindate, " to ", maxdate))
This code chunk makes a slight change to the visualization you created in the last chunk; now the “data from: 2015 to 2017” subtitle is in the bottom right corner.
Now you want to clean up the x and y axis labels to make sure they
are really clear. To do that, you can add to the labs()
function and use x= and y=. Feel free to
change the text of the label and play around with it:
ggplot(data = hotel_bookings) +
geom_bar(mapping = aes(x = market_segment)) +
facet_wrap(~hotel) +
theme(axis.text.x = element_text(angle = 45)) +
labs(title="Comparison of market segments by hotel type for hotel bookings",
caption=paste0("Data from: ", mindate, " to ", maxdate),
x="Market Segment",
y="Number of Bookings")
Now you have the data visualization from earlier, but now the x and y axis labels have been changed from ‘market_segment’ and ‘count’ to ‘Market Segment’ and ‘Number of Bookings’ so that the chart is clearer.
Now, it’s time to save what you just created so you can easily share with stakeholders.
You can use the ggsave() function to do just that! It
will save your image as a 7x7 at the file path you input by default,
which makes it simple to export your plots from R.
The ggsave() function in the code chunk below will save
the last plot that was generated, so if you ran something after running
the code chunk above, run that code chunk again.
Then run the following code chunk to save that plot as a .png file
named hotel_booking_chart, which makes it clear to your
stakeholders what the .png file contains. Now you should be able to find
this file in your ‘Files’ tab in the bottom right of your screen. Check
it out!
ggsave('hotel_booking_chart.png',
width=16,
height=8)
Now you have finished creating and exporting a data visualization
with annotations in ggplot2, you can share what you created
with key stakeholders to give them insights into your data findings.
These skills will allow you to create, annotate, and share your data
visualizations directly from your R console space. You can
practice these skills by modifying the code chunks in the rmd file, or
use this code as a starting point in your own project console. You will
continue learning more about ggplot2 functions in this
course, but with the skills you have already been practicing, you will
be able to generate plots, utilize aesthetic functions, apply filters,
and create annotations to explain your data.