This is just an experiment out of my curiosity, as a beginner in R programming. I just want to apply some code I found on the internet for making time series heatmap in the form of calendar to the data of my Google Course’s Capstone Project, “How does a bike-share navigate speedy success?”.
This calendar indicates rental rate each day over a year from July 1st,2021 to June 30th, 2022. This is not the whole project analysis, just experiment with a form of graphic presentation that’s not only neat and attractive, but also precise and clear at once.
library("tidyverse")
library("lubridate")
library("knitr")
library("zoo")
library("viridis")
The same data used in Google Course Capstone Project, but I just use a fraction of all these data in this experiment. The data recorded everything about transaction of bike-rental company over a period from July 1st, 2021 to June 30th, 2022. Each file contain data of transaction for each month.
df_202107 <- read.csv("C:/Users/Family/Downloads/202107-divvy-tripdata/202107-divvy-tripdata.csv")
df_202108 <- read.csv("C:/Users/Family/Downloads/202108-divvy-tripdata/202108-divvy-tripdata.csv")
df_202109 <- read.csv("C:/Users/Family/Downloads/202109-divvy-tripdata/202109-divvy-tripdata.csv")
df_202110 <- read.csv("C:/Users/Family/Downloads/202110-divvy-tripdata/202110-divvy-tripdata.csv")
df_202111 <- read.csv("C:/Users/Family/Downloads/202111-divvy-tripdata/202111-divvy-tripdata.csv")
df_202112 <- read.csv("C:/Users/Family/Downloads/202112-divvy-tripdata/202112-divvy-tripdata.csv")
df_202201 <- read.csv("C:/Users/Family/Downloads/202201-divvy-tripdata/202201-divvy-tripdata.csv")
df_202202 <- read.csv("C:/Users/Family/Downloads/202202-divvy-tripdata/202202-divvy-tripdata.csv")
df_202203 <- read.csv("C:/Users/Family/Downloads/202203-divvy-tripdata/202203-divvy-tripdata.csv")
df_202204 <- read.csv("C:/Users/Family/Downloads/202204-divvy-tripdata/202204-divvy-tripdata.csv")
df_202205 <- read.csv("C:/Users/Family/Downloads/202205-divvy-tripdata/202205-divvy-tripdata.csv")
df_202206 <- read.csv("C:/Users/Family/Downloads/202206-divvy-tripdata/202206-divvy-tripdata.csv")
Then combine all the tables into one year data frame.
union <- rbind(df_202107, df_202108, df_202109, df_202110, df_202111, df_202112, df_202201, df_202202, df_202203, df_202204, df_202205, df_202206)
Only the whole year data frame is what we need, so it’s not necessary to keep all the imported tables in this project anymore. I have to remove all these unnecessary file in order to save some space.
rm(df_202107, df_202108, df_202109, df_202110, df_202111, df_202112, df_202201, df_202202, df_202203, df_202204, df_202205, df_202206)
sum(table(union$ride_id)-1)
## [1] 0
We need only 2 columns from original data frames, transaction ID(ride_id) and rental time(started_at), so we remove all other unnecessary column to make our table more space economical. Then we rearrange all the rental time into order, and add more column for extracted month, date, week, YMD, day, year and year_month(month and year come together, in this case we need zoo package) from started_at.
union_1 <- union %>%
drop_na(c(start_lat, start_lng, end_lat, end_lng)) %>%
select(-c(rideable_type,ended_at,start_station_name,
start_station_id,end_station_name,
end_station_id,start_lat,start_lng,
end_lat,end_lng,member_casual)) %>%
arrange(started_at) %>% mutate(weekday = wday(started_at, label = T, week_start = 1),
month = month(started_at, label = T),
date = yday(started_at),
week = isoweek(started_at),
YMD = date(started_at),
day = day(started_at),
year = year(started_at),
year_month = zoo::as.yearmon(started_at)
)
Creating another table for plotting heatmap. First, select all the column necessary for plotting from union_1 table, then grouped them up by date and count all the rental transactions each day as “numtrips”.
calendar_heatmap_data <- union_1 %>%
select(
date,
weekday,
day,
month,
week,
YMD,
year_month,
year
) %>%
group_by(
date
) %>%
mutate(
numtrips = n()
) %>%
distinct(
date,
.keep_all = TRUE
)
For the case that some days in last week of the year are in the following year, those days are supposed to be in the first week of the next year but the program may recognize them as in the 52nd or 53rd week of the year. So we need another chunk to correct that.
calendar_heatmap_data <- mutate(calendar_heatmap_data,
week = case_when(month == "Dec" & week == 1 ~ 53,
month == "Jan" & week %in% 52:53 ~ 0,
TRUE ~ week))
Arrange year_month in order. The reason I need this column (which also need zoo package) because the code from which I modified are written for 1 year from January 1st to December 31st, not overlapping 2 years like in my case. So I need to attach year after each month as my calendar starts from July to June of the next year.
calendar_heatmap_data$year_month <- ordered(
calendar_heatmap_data$year_month,
levels = c(
"Jul 2021", "Aug 2021", "Sep 2021", "Oct 2021",
"Nov 2021", "Dec 2021", "Jan 2022", "Feb 2022",
"Mar 2022", "Apr 2022", "May 2022", "Jun 2022"
)
)
I creating this chart using geom_tile, with scale_fill_viridis to apply specific color according to the number of transactions (numtrips) in each day. I also add some customized theme for this chart by using the theme() function. The lay out I use is facet wrap with 4 x 3 facet distribution.
calendar_heatmap_data %>%
ggplot(aes(weekday,-week, fill = numtrips)) +
geom_tile(colour = "white") +
geom_text(aes(label = day), size = 2.5, color = "black") +
theme(aspect.ratio = 1/2,
legend.position = "top",
legend.key.width = unit(3, "cm"),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank(),
legend.title.align = 0.5,
strip.background = element_blank(),
strip.text = element_text(face = "bold", size = 15),
panel.border = element_rect(colour = "grey", fill=NA, size=1),
plot.title = element_text(hjust = 0.5, size = 21, face = "bold",
margin = margin(0,0,0.5,0, unit = "cm"))) +
scale_fill_viridis(option = "H",
name = "Number of trips",
guide = guide_colorbar(title.position = "top",
direction = "horizontal")) +
facet_wrap(~year_month, nrow = 4, ncol = 3, scales = "free") +
labs(title = "Cyclistic Heatmap 2021 - 2022")
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The first time I saw after running my R Script, it seemed very messy at first, but I learned later that I had to zoom in order to see what it really looked like. I can’t find any button to zoom my chart after running my R Markdown. So I don’t know what to expect from running this R Markdown on Kaggle. As I write this paragraph, I still don’t know what would come out, but I can tell it come out fine on my R Script.