This time, I use another dataset called CheetahRegion. It contains sales figures (in USD) of a company for 10 consecutive years, broken down by sales regions (east and west) and the total for the company. The years and the figures per year for Eastern, Western and the Total are each one variables in a data table.
This is how the dataset looks.
library(xlsx)
## Warning: package 'xlsx' was built under R version 4.2.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
#Note that we need another package "xlsx" to import .xlsx files into R. Install and load it first.
cheet <- read.xlsx("C:/Users/petemaur/Teaching/Data/CheetahRegion.xlsx", 1)
view(cheet)
Now, we want to compare with a chart the development of sales figures between the regions and the total. For this, we can sue a line chart. The lines show the development of sales in USD for each region and the total sales. Since the figures for each region are stored in separate variables, we must bild the chart layer by layer. That means, we add several layers with lines, one for each region and one for total sales.
ggplot(cheet, aes(Year))+
geom_line(aes(y = Eastern))+
geom_line(aes(y = Western))+
geom_line(aes(y = Total))+
labs(y = "Sales in Million $")+
theme_classic()
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Removed 4 row(s) containing missing values (geom_path).
## Removed 4 row(s) containing missing values (geom_path).
We have used 3 geom_line() layers, one for each region, because their sales figures are stored in 3 different variables. Since they are all from the same dataset, we can specify the data only once and since the variable mapped to the x-axis is always the same one (year), we map it also only once.
Note that the only things we’ve changed from the default is labelling the y-axis and adding the usual classic theme.
In the next steps, we will make the chart look nicer and more professional. This is done by tweaking axes, grid lines, formatting text and playing with color. We also construct a legend “by hand”.
ggplot(cheet, aes(Year))+
geom_line(aes(y = Eastern, color = "Eastern"), size = 1.3, linetype = "solid")+
geom_line(aes(y = Western, color = "Western"), size = 1.3, linetype = "dashed")+
geom_line(aes(y = Total, color = "Total"), size = 1.3, linetype = "dotted")+
scale_x_continuous(n.breaks = 10)+
scale_y_continuous(n.breaks = 10, limits = c(0,NA))+
scale_color_manual(values = c("Eastern" = "red", "Western" = "chocolate", "Total" = "orange"), name = "Region")+
labs(y = "Sales in Million $")+
theme_classic()+
theme(axis.text.x = element_text(face = "italic", size = 12, color = "firebrick"), axis.title.x = element_text(size = 12.5, face = "italic"))+
theme(axis.text.y = element_text(face = "bold", size = 12, color = "purple"), axis.title.y = element_text(size = 12.5, face = "italic"))+
theme(panel.grid.major.y = element_line(color = "grey50", linetype = "dashed", size = 0.5))+
theme(plot.title = element_text(face = "bold", color = "coral3", size = 15, hjust = 0.5))+
ggtitle("Development of Sales")
## Warning: Removed 4 row(s) containing missing values (geom_path).
## Removed 4 row(s) containing missing values (geom_path).
## Removed 4 row(s) containing missing values (geom_path).