We need the dplyr library because it’s convenient to slice and dice and aggregate dataframes, and the ggplot2 library for the plot.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
First let’s generate a synthetic dataset - random times between midnight and 5am, 100,000 samples of things being on and off from it.
a=seq(as.POSIXct("2001-01-01 0:00:00"), as.POSIXct("2001-01-01 5:00:00"),by="min")
sample_a=sample(a,100000,replace=TRUE)
things=c("cat","dog","bear")
sample_things=sample(things,100000,replace=TRUE)
ooff=c(0,1)
sample_oof=sample(ooff,100000,replace=TRUE)
df=data.frame(time=sample_a,thing=sample_things,on=sample_oof)
Now we use the dplyr function group_by to state that the time field and the thing field are what we want to group by, and the summarize function to then apply a function (sum) to those groups.
df=group_by(df,time,thing)
sum_df=summarize(df,count_on=sum(on))
And the ggplot function gives us a quick way to put a plot together, it handles continuous time fields fine.
p=ggplot(sum_df,aes(x=time,y=count_on,fill=thing)) + geom_bar(stat="identity") + labs(x="Time",y="Number on",fill="Thing")
print(p)