A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not.
In this work we are going to see tibble-s. For more information on this new version of data.frame please see:
https://r4ds.had.co.nz/tibbles.html https://tibble.tidyverse.org/
library(tidyverse)
head(iris,10)
as_tibble(iris)
x.tbl<-tibble(
x1 = seq(1,10),
x2 = seq(11,20),
x3 = x1 ^ 2 + x2*0.5
)
x.tbl
Libraries we will need are:
library(tidyverse)
library(lubridate)
library(nycflights13)
For this information you can use function: today() or now():
today()
[1] "2021-01-07"
now()
[1] "2021-01-07 18:35:48 CET"
Otherwise, there are three ways you may use to create a date/time: •From a string. •From individual date-time components. •From an existing date/time object.
Use library(lubridate), to identify the order in which year, month, and day appear in your dates, then arrange “y”, “m”, and “d” in the same order. That gives you the name of the lubridate function that will parse your date.
ymd("2020-12-31")
[1] "2020-12-31"
mdy("December 31st, 2020")
[1] "2020-12-31"
dmy("1-Jan-2020")
[1] "2020-01-01"
Greenwich Mean Time (GMT) is often interchanged or confused with Coordinated Universal Time (UTC) which is a time standard that is the basis for civil time and time zones worldwide. Adding information about hour, minute and second use :“h”, “m”, and “s” to the function. Example: ymd_hms()
# will give as output the date and the hour/min/sec
ymd_hms("2020-12-31 23:58:59")
[1] "2020-12-31 23:58:59 UTC"
# will give as output the date and the hour/min
mdy_hm("12/31/2020 11:58")
[1] "2020-12-31 11:58:00 UTC"
# You can also force the creation of a date-time from a date by supplying a timezone:
ymd(20201231, tz = "UTC")
[1] "2020-12-31 UTC"
Import a dataframe of Covid new cases and deaths per day published by:
Ref:https://www.ecdc.europa.eu/en/publications-data/data-national-14-day-notification-rate-covid-19
covid.data <- read.csv("C:/Users/user/Desktop/Covid data-12 october 2020.csv")
View(covid.data)
head(covid.data,10)
To create a date/time from this sort of input, use make_date() for dates, or make_datetime() for date-times: the function mutate() adds a variable to your dataframe.
covid.data[,2:4] %>% mutate(information_time = make_datetime(year, month, day))
You may want to switch between a date-time and a date. That’s the job of as_datetime() and as_date():
today()
[1] "2021-01-07"
as_datetime(today())
[1] "2021-01-07 UTC"
now()
[1] "2021-01-07 18:48:18 CET"
as_date(now())
[1] "2021-01-07"
Applying in the Covid dataset:
head(covid.data[,1],10)
[1] "2020-10-13T00:00:00Z" "2020-10-12T00:00:00Z" "2020-10-11T00:00:00Z"
[4] "2020-10-10T00:00:00Z" "2020-10-09T00:00:00Z" "2020-10-08T00:00:00Z"
[7] "2020-10-07T00:00:00Z" "2020-10-06T00:00:00Z" "2020-10-05T00:00:00Z"
[10] "2020-10-04T00:00:00Z"
head(as_datetime(covid.data[,1]),10)
[1] "2020-10-13 UTC" "2020-10-12 UTC" "2020-10-11 UTC" "2020-10-10 UTC"
[5] "2020-10-09 UTC" "2020-10-08 UTC" "2020-10-07 UTC" "2020-10-06 UTC"
[9] "2020-10-05 UTC" "2020-10-04 UTC"
head(as_date(covid.data[,1]),10)
[1] "2020-10-13" "2020-10-12" "2020-10-11" "2020-10-10" "2020-10-09"
[6] "2020-10-08" "2020-10-07" "2020-10-06" "2020-10-05" "2020-10-04"
Example Use the above function to parse the data below:
d1 <- "January 1, 2020"
d1
[1] "January 1, 2020"
d1.t<-mdy(d1)
d1.t
[1] "2020-01-01"
d2 <- "2020-Mar-14"
ymd(d2)
[1] "2020-03-14"
d3 <- "06-Jun-2013"
dmy(d3)
[1] "2013-06-06"
d4 <- c("August 6 (2013)", "July 10 (2016)")
mdy(d4)
[1] "2013-08-06" "2016-07-10"
d5 <- "12/31/20" # Dec 31, 2020
mdy(d5)
[1] "2020-12-31"
We can pull out individual parts of the date with the accessor functions: year(), month(), mday() (day of the month), yday() (day of the year), wday() (day of the week), hour(), minute(), second()
Let use some of these functions at the covid dataset. We have used the head() function to display just a few of the data.
head(covid.data[,1],10)
[1] "2020-10-13T00:00:00Z" "2020-10-12T00:00:00Z" "2020-10-11T00:00:00Z"
[4] "2020-10-10T00:00:00Z" "2020-10-09T00:00:00Z" "2020-10-08T00:00:00Z"
[7] "2020-10-07T00:00:00Z" "2020-10-06T00:00:00Z" "2020-10-05T00:00:00Z"
[10] "2020-10-04T00:00:00Z"
head(year(covid.data[,1]),10)
[1] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020
head(month(covid.data[,1]),10)
[1] 10 10 10 10 10 10 10 10 10 10
head(day(covid.data[,1]),10)
[1] 13 12 11 10 9 8 7 6 5 4
For month() and wday() you can set argument label = TRUE to return the abbreviated name of the month or day of the week. Set argument abbr = FALSE to return the full name. (Keep in mind that we can use TRUE or T, and FALSE or F)
head(month(covid.data[,1],label=TRUE),10)
[1] Oct Oct Oct Oct Oct Oct Oct Oct Oct Oct
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
head(month(covid.data[,1],label=TRUE,abbr=F),10)
[1] October October October October October October October October October
[10] October
12 Levels: January < February < March < April < May < June < ... < December
head(wday(covid.data[,1],label=TRUE),10)
[1] Tue Mon Sun Sat Fri Thu Wed Tue Mon Sun
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
head(wday(covid.data[,1],label=T,abbr=F),10)
[1] Tuesday Monday Sunday Saturday Friday Thursday Wednesday
[8] Tuesday Monday Sunday
7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
An alternative approach to plotting individual components is to round the date to a nearby unit of time, with: floor_date() round_date() and ceiling_date().
round_date() takes a date-time object and time unit, and rounds it to the nearest value of the specified time unit.
floor_date() takes a date-time object and rounds it down to the nearest boundary of the specified time unit.
ceiling_date() takes a date-time object and rounds it up to the nearest boundary of the specified time unit.
Each function takes a vector of dates to adjust and then the name of the unit round down (floor), round up (ceiling), or round to.
x <- ymd_hms("2021-01-06 12:01:58.36")
x
[1] "2021-01-06 12:01:58 UTC"
round_date(x, ".5s")
[1] "2021-01-06 12:01:58 UTC"
round_date(x, "sec")
[1] "2021-01-06 12:01:58 UTC"
round_date(x, "second")
[1] "2021-01-06 12:01:58 UTC"
round_date(x, "minute")
[1] "2021-01-06 12:02:00 UTC"
round_date(x, "10 mins")
[1] "2021-01-06 12:00:00 UTC"
round_date(x, "hour")
[1] "2021-01-06 12:00:00 UTC"
round_date(x, "2 hours")
[1] "2021-01-06 12:00:00 UTC"
round_date(x, "5 hours")
[1] "2021-01-06 10:00:00 UTC"
round_date(x, "day")
[1] "2021-01-07 UTC"
round_date(x, "week")
[1] "2021-01-10 UTC"
round_date(x, "month")
[1] "2021-01-01 UTC"
round_date(x, "bimonth")
[1] "2021-01-01 UTC"
round_date(x, "quarter") # same as: round_date(x, "3 months")
[1] "2021-01-01 UTC"
round_date(x, "halfyear")
[1] "2021-01-01 UTC"
round_date(x, "year")
[1] "2021-01-01 UTC"
x <- ymd_hms("2021-05-06 12:01:58.36")
x
[1] "2021-05-06 12:01:58 UTC"
floor_date(x, ".1s")
[1] "2021-05-06 12:01:58 UTC"
floor_date(x, "second")
[1] "2021-05-06 12:01:58 UTC"
floor_date(x, " 5 minutes")
[1] "2021-05-06 12:00:00 UTC"
floor_date(x, "hour")
[1] "2021-05-06 12:00:00 UTC"
floor_date(x, "day")
[1] "2021-05-06 UTC"
floor_date(x, "week")
[1] "2021-05-02 UTC"
floor_date(x, "month")
[1] "2021-05-01 UTC"
floor_date(x, "bimonth")
[1] "2021-05-01 UTC"
floor_date(x, "quarter")
[1] "2021-04-01 UTC"
floor_date(x, "season")
[1] "2021-03-01 UTC"
floor_date(x, "halfyear")
[1] "2021-01-01 UTC"
floor_date(x, "year")
[1] "2021-01-01 UTC"
x <- ymd_hms("2021-01-06 12:01:58.36")
x
[1] "2021-01-06 12:01:58 UTC"
ceiling_date(x, ".2 sec")
[1] "2021-01-06 12:01:58 UTC"
ceiling_date(x, "second")
[1] "2021-01-06 12:01:59 UTC"
ceiling_date(x, "minute")
[1] "2021-01-06 12:02:00 UTC"
ceiling_date(x, "5 mins")
[1] "2021-01-06 12:05:00 UTC"
ceiling_date(x, "hour")
[1] "2021-01-06 13:00:00 UTC"
ceiling_date(x, "day")
[1] "2021-01-07 UTC"
ceiling_date(x, "week")
[1] "2021-01-10 UTC"
ceiling_date(x, "month")
[1] "2021-02-01 UTC"
ceiling_date(x, "bimonth") # same as ceiling_date(x, "2 months")
[1] "2021-03-01 UTC"
ceiling_date(x, "quarter")
[1] "2021-04-01 UTC"
ceiling_date(x, "season")
[1] "2021-03-01 UTC"
ceiling_date(x, "halfyear")
[1] "2021-07-01 UTC"
ceiling_date(x, "year")
[1] "2022-01-01 UTC"
We can use each accessor function to set the components of a date/time. Let’s suppose we need to correct some values in: year, month, day, hour, minute, second and time zone as well.
d.time_2 <- ymd_hms("2021-01-08 16:45:52")
d.time_2
[1] "2021-01-08 16:45:52 UTC"
year(d.time_2) <- 2022 # correct the year from 2021 to 2022
d.time_2
[1] "2022-01-08 16:45:52 UTC"
month(d.time_2) <- 03 # correct the month from 01 (january) to 03 (march)
d.time_2
[1] "2022-03-08 16:45:52 UTC"
mday(d.time_2) <- 13 # correct the date from 08 to 13
d.time_2
[1] "2022-03-13 16:45:52 UTC"
hour(d.time_2)<-06 # correct the hour
d.time_2
[1] "2022-03-13 06:45:52 UTC"
minute(d.time_2)<-05 # correct the minute
d.time_2
[1] "2022-03-13 06:05:52 UTC"
second(d.time_2)<-25 # correct the second
d.time_2
[1] "2022-03-13 06:05:25 UTC"
Otherwise we can use update() function to correct the above details.
update(d.time_2, year =2022 , month =03 , mday =13 , hour =06,minute=05, second=25 )
[1] "2022-03-13 06:05:25 UTC"
update(d.time_2, year =2025 , mday =15 )
[1] "2025-03-15 06:05:25 UTC"
Have you thought how arithmetic with dates works, including subtraction, addition, and division. These are three important classes that represent time spans:
-> durations, which represent an exact number of seconds. -> periods, which represent human units like weeks and months. -> intervals, which represent a starting and ending point.
Example How old am I?
eralda.age <- today() - ymd(19830114) # year 1983, month 01, date 14
eralda.age
Time difference of 13873 days
print(paste (" Eralda age is",eralda.age,"days"))
[1] " Eralda age is 13873 days"
eralda.years<-as.duration(eralda.age)
eralda.years
[1] "1198627200s (~37.98 years)"
print(paste (" Eralda age is",eralda.years))
[1] " Eralda age is 1198627200s (~37.98 years)"
dfunction(value) is another used function which takes a value x (which can be in units of second, minutes, hours, days, weeks, years) and returns it’s duration in seconds. Example
dseconds(10)# duration in seconds of 10 sec
[1] "10s"
dminutes(5)# duration in seconds of 5 minutes
[1] "300s (~5 minutes)"
dhours(c(2, 10))# duration in seconds of 2 hours and 10 hours
[1] "7200s (~2 hours)" "36000s (~10 hours)"
ddays(0:7)# duration in seconds of 0 days, 1 day, 2 days up to 7 days
[1] "0s" "86400s (~1 days)" "172800s (~2 days)"
[4] "259200s (~3 days)" "345600s (~4 days)" "432000s (~5 days)"
[7] "518400s (~6 days)" "604800s (~1 weeks)"
dweeks(2)# duration in seconds of 2 weeks
[1] "1209600s (~2 weeks)"
dyears(c(1,2))#duration in seconds of 1 year and 2 years
[1] "31557600s (~1 years)" "63115200s (~2 years)"
You can add and multiply durations:
10 * dyears(1)
[1] "315576000s (~10 years)"
dyears(10)
[1] "315576000s (~10 years)"
dyears(1) + dweeks(12) + dhours(15)
[1] "38869200s (~1.23 years)"
You can add and subtract durations to and from days:
tomorrow <- today() + ddays(1)
tomorrow
[1] "2021-01-08"
dbefore.2<-today() + ddays(-2)
dbefore.2
[1] "2021-01-05"
two_year_before <- today() - dyears(2)
two_year_before
[1] "2019-01-07 12:00:00 UTC"
Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months. That allows them to work in a more intuitive way. So you can use: • seconds() • minutes() • hours() • days() • months() • weeks() • years()
new.year <- ymd_hms("2021-01-01 00:00:01")
new.year
[1] "2021-01-01 00:00:01 UTC"
new.year + days(1)
[1] "2021-01-02 00:00:01 UTC"
new.year + days(-1)
[1] "2020-12-31 00:00:01 UTC"
new.year + days(31)
[1] "2021-02-01 00:00:01 UTC"
new.year + days(31)+ hours(12)
[1] "2021-02-01 12:00:01 UTC"
new.year + days(31)+ hours(12)+minutes(15)
[1] "2021-02-01 12:15:01 UTC"
new.year + days(31)+ hours(12)+minutes(15)+seconds(25)
[1] "2021-02-01 12:15:26 UTC"
You can add and multiply periods:
10 * (months(2) + days(1))
[1] "20m 10d 0H 0M 0S"
days(61) + hours(25) + minutes(5)
[1] "61d 25H 5M 0S"
dyears(1)
[1] "31557600s (~1 years)"
ymd("2020-01-01") + dyears(1)
[1] "2020-12-31 06:00:00 UTC"
ymd("2020-01-01") + years(1)
[1] "2021-01-01"
new.year + ddays(1)
[1] "2021-01-02 00:00:01 UTC"
new.year + days(1)
[1] "2021-01-02 00:00:01 UTC"
Often with time series you want to aggregate your dataset to a less granular period. An example of this might be moving from a daily series to a weekly or monthly series to look at broader trends in your data. as_period() allows you to do exactly this.
The period argument in as_period() for specifying the transformation you want is a character with a general format of “frequency period” where frequency is a number like 1 or 2, and period is an interval like weekly or yearly. There must be a space between the two.
library(lubridate)
library(tibbletime)
Energy.t<-as_tbl_time(energy.df,index =dateRep)# transform the dataframe into tibble
head(Energy.t[,1:7])
# A time tibble: 6 x 7
# Index: dateRep
dateRep Outflow.V.2010. Outflow.K..2010. Outflow.F..2010.
<dttm> <dbl> <dbl> <dbl>
1 2010-01-01 00:00:00 242. 406. 343.
2 2010-01-02 00:00:00 257. 469. 252.
3 2010-01-03 00:00:00 446. 391. 349.
4 2010-01-04 00:00:00 571. 428. 443.
5 2010-01-05 00:00:00 554. 508. 448.
6 2010-01-06 00:00:00 543. 504. 406.
# ... with 3 more variables: Inflow.V..2010. <dbl>, Inflow..F..2010. <dbl>,
# Inflow.K..2010. <dbl>
# transform the data to monthly from daily
Energy.monthly<-as_period(Energy.t, '1 month')
head(Energy.monthly[,1:7])
# A time tibble: 6 x 7
# Index: dateRep
dateRep Outflow.V.2010. Outflow.K..2010. Outflow.F..2010.
<dttm> <dbl> <dbl> <dbl>
1 2010-01-01 00:00:00 242. 406. 343.
2 2010-02-01 00:00:00 476. 516. 361.
3 2010-03-01 00:00:00 544. 411. 317.
4 2010-04-01 00:00:00 448. 494. 337.
5 2010-05-01 00:00:00 433. 383. 400.
6 2010-06-01 00:00:00 441. 446. 384.
# ... with 3 more variables: Inflow.V..2010. <dbl>, Inflow..F..2010. <dbl>,
# Inflow.K..2010. <dbl>
# Create monthly time series from daily (sum of energy produced daily from each HPP)
library(xts)
energy.xts.f <- xts(energy.df$Production.Fierze..2010., energy.df$dateRep) # create a xts object
head(energy.xts.f)
timezone of object (UTC) is different than current timezone ().
[,1]
2010-01-01 6100
2010-01-02 8500
2010-01-03 10800
2010-01-04 10800
2010-01-05 9800
2010-01-06 8300
# total of monthly energy produced in F
energy.f.m<-apply.monthly(energy.xts.f, sum)
head(energy.f.m)
timezone of object (UTC) is different than current timezone ().
[,1]
2010-01-31 298600
2010-02-28 228200
2010-03-31 304013
2010-04-30 292220
2010-05-31 281860
2010-06-30 188400
# total of weekly energy produced in F
energy.f.w<-apply.weekly(energy.xts.f,sum)
head(energy.f.w)
timezone of object (UTC) is different than current timezone ().
[,1]
2010-01-03 25400
2010-01-10 69600
2010-01-17 69400
2010-01-24 65400
2010-01-31 68800
2010-02-07 57600
# total of yearly energy produced in F
energy.f.y<-apply.yearly(energy.xts.f,sum,na.rm=T)
head(energy.f.y)
timezone of object (UTC) is different than current timezone ().
[,1]
2010-12-31 2630580
2011-12-31 1214409
2012-12-31 1087102
2013-12-31 2024624
2014-12-31 1045315
2015-12-31 1615998
library(tibbletime)
library(dplyr)
data(FB)
head(FB)
FB <- as_tbl_time(FB, index = date)
head(FB)
# A time tibble: 6 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
data(FANG)
head(FANG,30)
FANG <- as_tbl_time(FANG, index = date)# create a tibble
FANG<- FANG %>% group_by(symbol)# grouping by symbol
FANG
# A time tibble: 4,032 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
# ... with 4,022 more rows
Let us now transform the daily FB data set to monthly data using the function as_period() and as argument “1 month”
as_period(FB, '1 month')
# A time tibble: 48 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-02-01 31.0 31.0 29.6 29.7 85856700 29.7
3 FB 2013-03-01 27.0 28.1 26.8 27.8 54064800 27.8
4 FB 2013-04-01 25.6 25.9 25.3 25.5 22249300 25.5
5 FB 2013-05-01 27.8 27.9 27.3 27.4 64567600 27.4
6 FB 2013-06-03 24.3 24.3 23.7 23.8 35733800 23.8
7 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
8 FB 2013-08-01 37.3 38.3 36.9 37.5 106066500 37.5
9 FB 2013-09-03 41.8 42.2 41.5 41.9 48774900 41.9
10 FB 2013-10-01 50.0 51.0 49.5 50.4 98114000 50.4
# ... with 38 more rows
You aren’t restricted to only 1 month periods. Maybe you wanted every 2 months? Or, 6 months?
as_period(FB, '2 m')
# A time tibble: 24 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-03-01 27.0 28.1 26.8 27.8 54064800 27.8
3 FB 2013-05-01 27.8 27.9 27.3 27.4 64567600 27.4
4 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
5 FB 2013-09-03 41.8 42.2 41.5 41.9 48774900 41.9
6 FB 2013-11-01 50.8 52.1 49.7 49.8 95033000 49.8
7 FB 2014-01-02 54.8 55.2 54.2 54.7 43195500 54.7
8 FB 2014-03-03 67.0 68.1 66.5 67.4 56824100 67.4
9 FB 2014-05-01 60.4 62.3 60.2 61.2 82429000 61.2
10 FB 2014-07-01 67.6 68.4 67.4 68.1 33243000 68.1
# ... with 14 more rows
as_period(FB, '6 m')
# A time tibble: 8 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
3 FB 2014-01-02 54.8 55.2 54.2 54.7 43195500 54.7
4 FB 2014-07-01 67.6 68.4 67.4 68.1 33243000 68.1
5 FB 2015-01-02 78.6 78.9 77.7 78.4 18177500 78.4
6 FB 2015-07-01 86.8 87.9 86.5 86.9 25260000 86.9
7 FB 2016-01-04 102. 102. 99.8 102. 37912400 102.
8 FB 2016-07-01 114. 115. 114. 114. 14980000 114.
Or maybe every 15 days? Note that the dates do not line up exactly with a difference of 25 days. This is due to the data set not being completely regular (there are gaps due to weekends and holidays). as_period() chooses the first date it can find in the period specified.
as_period(FB, '15 d')
# A time tibble: 98 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-16 30.2 30.4 29.5 29.8 75332700 29.8
3 FB 2013-01-31 29.2 31.5 28.7 31.0 190744900 31.0
4 FB 2013-02-15 28.5 28.8 28.1 28.3 33109300 28.3
5 FB 2013-03-04 27.8 28.1 27.4 27.7 32400700 27.7
6 FB 2013-03-18 26.4 26.8 25.8 26.5 26653700 26.5
7 FB 2013-04-01 25.6 25.9 25.3 25.5 22249300 25.5
8 FB 2013-04-16 26.8 27.1 26.4 26.9 27365900 26.9
9 FB 2013-05-01 27.8 27.9 27.3 27.4 64567600 27.4
10 FB 2013-05-16 26.5 26.5 25.9 26.1 35499100 26.1
# ... with 88 more rows
as_period(FB, '10 d')
# A time tibble: 146 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
3 FB 2013-01-22 29.8 30.9 29.7 30.7 55243300 30.7
4 FB 2013-01-31 29.2 31.5 28.7 31.0 190744900 31.0
5 FB 2013-02-11 28.6 28.7 28.0 28.3 37361800 28.3
6 FB 2013-02-20 28.9 29.0 28.3 28.5 42098200 28.5
7 FB 2013-03-04 27.8 28.1 27.4 27.7 32400700 27.7
8 FB 2013-03-12 28.1 28.3 27.6 27.8 27569600 27.8
9 FB 2013-03-22 25.8 26.0 25.6 25.7 18456300 25.7
10 FB 2013-04-01 25.6 25.9 25.3 25.5 22249300 25.5
# ... with 136 more rows
as_period(FB, '10 d', start_date = "2013-01-01")
# A time tibble: 146 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
3 FB 2013-01-22 29.8 30.9 29.7 30.7 55243300 30.7
4 FB 2013-01-31 29.2 31.5 28.7 31.0 190744900 31.0
5 FB 2013-02-11 28.6 28.7 28.0 28.3 37361800 28.3
6 FB 2013-02-20 28.9 29.0 28.3 28.5 42098200 28.5
7 FB 2013-03-04 27.8 28.1 27.4 27.7 32400700 27.7
8 FB 2013-03-12 28.1 28.3 27.6 27.8 27569600 27.8
9 FB 2013-03-22 25.8 26.0 25.6 25.7 18456300 25.7
10 FB 2013-04-01 25.6 25.9 25.3 25.5 22249300 25.5
# ... with 136 more rows
By default, the first date per period is returned. If you want the end of each period instead, specify the side = “end” argument.
as_period(FB, 'y')
# A time tibble: 4 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2014-01-02 54.8 55.2 54.2 54.7 43195500 54.7
3 FB 2015-01-02 78.6 78.9 77.7 78.4 18177500 78.4
4 FB 2016-01-04 102. 102. 99.8 102. 37912400 102.
as_period(FB, 'y', side = "end")
# A time tibble: 4 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-12-31 54.1 54.9 53.9 54.7 43076200 54.7
2 FB 2014-12-31 79.5 79.8 77.9 78.0 19935400 78.0
3 FB 2015-12-31 106 106. 105. 105. 18298700 105.
4 FB 2016-12-30 117. 117. 115. 115. 18600100 115.
One of the neat things about working in the tidyverse is that these functions can also work with grouped datasets.
Here we transform the daily series of the 4 FANG stocks to a periodicity of every 6 months and 1 year. Try to change the periodicity to 4 months and 2 years.
FANG %>% as_period('6 m')
# A time tibble: 32 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-07-01 25.0 25.1 24.6 24.8 20582200 24.8
3 FB 2014-01-02 54.8 55.2 54.2 54.7 43195500 54.7
4 FB 2014-07-01 67.6 68.4 67.4 68.1 33243000 68.1
5 FB 2015-01-02 78.6 78.9 77.7 78.4 18177500 78.4
6 FB 2015-07-01 86.8 87.9 86.5 86.9 25260000 86.9
7 FB 2016-01-04 102. 102. 99.8 102. 37912400 102.
8 FB 2016-07-01 114. 115. 114. 114. 14980000 114.
9 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.
10 AMZN 2013-07-01 279 283. 277. 282. 2888200 282.
# ... with 22 more rows
FANG %>% as_period('1 y')
# A time tibble: 16 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 6.98e7
2 FB 2014-01-02 54.8 55.2 54.2 54.7 4.32e7
3 FB 2015-01-02 78.6 78.9 77.7 78.4 1.82e7
4 FB 2016-01-04 102. 102. 99.8 102. 3.79e7
5 AMZN 2013-01-02 256. 258. 253. 257. 3.27e6
6 AMZN 2014-01-02 399. 399. 394. 398. 2.14e6
7 AMZN 2015-01-02 313. 315. 307. 309. 2.78e6
8 AMZN 2016-01-04 656. 658. 628. 637. 9.31e6
9 NFLX 2013-01-02 95.2 95.8 90.7 92.0 1.94e7
10 NFLX 2014-01-02 367. 368. 361. 363. 1.23e7
11 NFLX 2015-01-02 344. 352. 341. 349. 1.35e7
12 NFLX 2016-01-04 109 110 105. 110. 2.08e7
13 GOOG 2013-01-02 719. 727. 717. 723. 5.10e6
14 GOOG 2014-01-02 1115. 1118. 1108. 1113. 3.66e6
15 GOOG 2015-01-02 529. 531. 524. 525. 1.45e6
16 GOOG 2016-01-04 743 744. 731. 742. 3.27e6
# ... with 1 more variable: adjusted <dbl>
Create a tbl_time object with a sequence of regularly spaced dates
options("digits.secs" = 4)
options("digits" = 18)
milli$date
[1] "2013-01-01 00:00:00.0 UTC" "2013-01-01 00:00:00.0 UTC"
[3] "2013-01-01 00:00:00.2 UTC" "2013-01-01 00:00:00.2 UTC"
[5] "2013-01-01 00:00:00.4 UTC" "2013-01-01 00:00:00.5 UTC"
[7] "2013-01-01 00:00:00.5 UTC" "2013-01-01 00:00:00.7 UTC"
[9] "2013-01-01 00:00:00.7 UTC" "2013-01-01 00:00:00.9 UTC"
[11] "2013-01-01 00:00:01.0 UTC"
as.numeric(milli$date)
[1] 1356998400.0000000 1356998400.0999999 1356998400.2000000
[4] 1356998400.3000000 1356998400.4000001 1356998400.5000000
[7] 1356998400.5999999 1356998400.7000000 1356998400.8000000
[10] 1356998400.9000001 1356998401.0000000
filter_time(.tbl_time, time_formula) The time_formula is specified using the format from ~ to.
Each side of the time_formula is specified as the character: ‘YYYY-MM-DD HH:MM:SS’, but powerful shorthand is available.
Some examples are: • Year: ‘2013’ ~ ‘2015’ • Month: ‘2013-01’ ~ ‘2016-06’ • Day: ‘2013-01-05’ ~ ‘2016-06-04’ • Second: ‘2013-01-05 10:22:15’ ~ ‘2018-06-03 12:14:22’ • Variations: ‘2013’ ~ ‘2016-06’
The time_formula can also use a one sided formula. • Only dates in 2015: ~‘2015’ • Only dates March 2015: ~‘2015-03’
The time_formula can also use ‘start’ and ‘end’ as keywords for your filter. • Start of the series to end of 2015: ‘start’ ~ ‘2015’
filter_time(FANG, ~ '2013')
# A time tibble: 1,008 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
# ... with 998 more rows
#create_series('2013' ~ '2014', '1 d')
filter_time(FANG, '2013' ~ '2014')
# A time tibble: 2,016 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
# ... with 2,006 more rows
# 2013-05-25 to 2014-06-04
filter_time(FANG, '2013-05-25' ~ '2014-06-04')
# A time tibble: 1,032 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-05-28 24.5 24.5 23.9 24.1 50079700 24.1
2 FB 2013-05-29 23.8 23.8 23.3 23.3 64237800 23.3
3 FB 2013-05-30 24.1 24.8 23.9 24.5 60733200 24.5
4 FB 2013-05-31 24.6 25.0 24.3 24.4 35925000 24.4
5 FB 2013-06-03 24.3 24.3 23.7 23.8 35733800 23.8
6 FB 2013-06-04 23.9 23.9 23.3 23.5 34760800 23.5
7 FB 2013-06-05 23.4 23.7 22.8 22.9 53819700 22.9
8 FB 2013-06-06 23.0 23.1 22.7 23.0 31260700 23.0
9 FB 2013-06-07 23.0 23.4 22.9 23.3 38699200 23.3
10 FB 2013-06-10 24.1 24.6 24.0 24.3 58393000 24.3
# ... with 1,022 more rows
# we can also use the pipe operator here and obtain the same result
FANG %>% filter_time('2013-05-25' ~ '2014-06-04')
# A time tibble: 1,032 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-05-28 24.5 24.5 23.9 24.1 50079700 24.1
2 FB 2013-05-29 23.8 23.8 23.3 23.3 64237800 23.3
3 FB 2013-05-30 24.1 24.8 23.9 24.5 60733200 24.5
4 FB 2013-05-31 24.6 25.0 24.3 24.4 35925000 24.4
5 FB 2013-06-03 24.3 24.3 23.7 23.8 35733800 23.8
6 FB 2013-06-04 23.9 23.9 23.3 23.5 34760800 23.5
7 FB 2013-06-05 23.4 23.7 22.8 22.9 53819700 22.9
8 FB 2013-06-06 23.0 23.1 22.7 23.0 31260700 23.0
9 FB 2013-06-07 23.0 23.4 22.9 23.3 38699200 23.3
10 FB 2013-06-10 24.1 24.6 24.0 24.3 58393000 24.3
# ... with 1,022 more rows
Let suppose we want to select every date in February, 2014:
filter_time(FANG,~'2014-02')
# A time tibble: 76 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2014-02-03 63.0 63.8 60.7 61.5 74866600 61.5
2 FB 2014-02-04 62.0 63.1 61.8 62.8 45985500 62.8
3 FB 2014-02-05 62.7 63.2 61.3 62.2 51685100 62.2
4 FB 2014-02-06 61.5 62.8 61.5 62.2 42086500 62.2
5 FB 2014-02-07 62.3 64.6 62.2 64.3 60704300 64.3
6 FB 2014-02-10 64.3 64.5 63.5 63.5 43666100 63.5
7 FB 2014-02-11 63.8 65 63.3 64.8 45675600 64.8
8 FB 2014-02-12 64.9 65.1 64.1 64.4 47282100 64.4
9 FB 2014-02-13 64.2 67.3 64.1 67.3 61911700 67.3
10 FB 2014-02-14 67.5 67.6 66.7 67.1 36694900 67.1
# ... with 66 more rows
For interactive use, to get an even quicker look at a dataset you can use the traditional extraction operator [ with the formula syntax.
FANG[~'2013']
# A time tibble: 1,008 x 8
# Index: date
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
# ... with 998 more rows
FANG['2014'~'2014-02', c(1,2,4,5)]
# A time tibble: 160 x 4
# Index: date
symbol date high low
<chr> <date> <dbl> <dbl>
1 FB 2014-01-02 55.2 54.2
2 FB 2014-01-03 55.7 54.5
3 FB 2014-01-06 57.3 54.0
4 FB 2014-01-07 58.5 57.2
5 FB 2014-01-08 58.4 57.2
6 FB 2014-01-09 59.0 56.7
7 FB 2014-01-10 58.3 57.1
8 FB 2014-01-13 58.2 55.4
9 FB 2014-01-14 57.8 56.1
10 FB 2014-01-15 58.6 57.3
# ... with 150 more rows
FANG['2014'~'2014-02', c(1,2,3:6)]
# A time tibble: 160 x 6
# Index: date
symbol date open high low close
<chr> <date> <dbl> <dbl> <dbl> <dbl>
1 FB 2014-01-02 54.8 55.2 54.2 54.7
2 FB 2014-01-03 55.0 55.7 54.5 54.6
3 FB 2014-01-06 54.4 57.3 54.0 57.2
4 FB 2014-01-07 57.7 58.5 57.2 57.9
5 FB 2014-01-08 57.6 58.4 57.2 58.2
6 FB 2014-01-09 58.7 59.0 56.7 57.2
7 FB 2014-01-10 57.1 58.3 57.1 57.9
8 FB 2014-01-13 57.9 58.2 55.4 55.9
9 FB 2014-01-14 56.5 57.8 56.1 57.7
10 FB 2014-01-15 58.0 58.6 57.3 57.6
# ... with 150 more rows
You can also cretae a date and use filter_time with that date as a start (end)
end.date <- as.Date("2015-01-01")
filter_time(FANG, 'start' ~ end.date)
# A time tibble: 2,016 x 8
# Index: date
# Groups: symbol [4]
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
# ... with 2,006 more rows
https://lubridate.tidyverse.org/
https://rpubs.com/Sergio_Garcia/working_with_dates_and_times
https://cran.r-project.org/web/packages/tibbletime/vignettes/TT-02-changing-time-periods.html
https://cran.r-project.org/web/packages/tibbletime/tibbletime.pdf
https://stackoverflow.com/questions/33105900/sum-daily-values-into-monthly-values