A Quick Tutorial on How to Filter Out Dates and Times from a Timestamp Through Separating and Conditionals

Let’s first load up the tidyverse library because Hadley Wickham is a homie.

library(tidyverse)






Let’s begin with a sample dataset of five datapoints of 5 people with different timestamps.

#Sample dataset

start_data <- tibble(Name = c("Sean_Ontime",
                              "Chips",
                              "Amy",
                              "Sean_Late",
                              "Sean_BarelyOntime"),
                     Time_Stamp = c("2020/01/04 07:32:21 AM PST",
                                    "2020/02/05 05:32:21 PM PST",
                                    "2020/03/04 12:09:11 AM PST",
                                    "2020/01/04 06:32:21 PM PST",
                                    "2020/01/04 09:00:00 AM PST"))

head(start_data)
## # A tibble: 5 x 2
##   Name              Time_Stamp                
##   <chr>             <chr>                     
## 1 Sean_Ontime       2020/01/04 07:32:21 AM PST
## 2 Chips             2020/02/05 05:32:21 PM PST
## 3 Amy               2020/03/04 12:09:11 AM PST
## 4 Sean_Late         2020/01/04 06:32:21 PM PST
## 5 Sean_BarelyOntime 2020/01/04 09:00:00 AM PST




Separating Dates

We will do two rounds of separation.

First separate the timestamp


In this case, we have the structure YYYY/MM/DD HH:MM:SS AM/PM PST
We will separate() the Timestamp column by everything separated by a space. This will result in 4 columns:

  1. Year
  2. Time
  3. AM/PM
  4. PST


#Separate timestamp out by spaces

DataTimeDate <- start_data %>%
  separate(Time_Stamp, sep = " ", into = c("Date", "Time", "DayNight", "PST")) %>%
  #Remove unnecessary columns
  select(-PST)

head(DataTimeDate)
## # A tibble: 5 x 4
##   Name              Date       Time     DayNight
##   <chr>             <chr>      <chr>    <chr>   
## 1 Sean_Ontime       2020/01/04 07:32:21 AM      
## 2 Chips             2020/02/05 05:32:21 PM      
## 3 Amy               2020/03/04 12:09:11 AM      
## 4 Sean_Late         2020/01/04 06:32:21 PM      
## 5 Sean_BarelyOntime 2020/01/04 09:00:00 AM




Further Separate the Date

We will now further separate the date using the numbers separated by a slash “/”. Since slash is a special character, in R we must indicate we are introducing a special character by proceeding it with two backslashes “\”.

For example, our code would look like… separate(Date, sep = “\/”)

This will separate the Date columns into three new columns:

  1. Year
  2. Month
  3. Day

Lastly, we will change the Year, Month, and Day column into numeric vectors (they are currently charcter vectors). This will allow us to use numerical filtering such as greater than or less than.

#Separate date into year, month, day

DataYearMonthDay <- DataTimeDate %>%
  separate(Date, sep = "\\/", into = c("Year", "Month", "Day")) %>%
  #Change dates into numerics for filtering
  mutate_at(c("Year", "Month", "Day"), as.numeric)

head(DataYearMonthDay)
## # A tibble: 5 x 6
##   Name               Year Month   Day Time     DayNight
##   <chr>             <dbl> <dbl> <dbl> <chr>    <chr>   
## 1 Sean_Ontime        2020     1     4 07:32:21 AM      
## 2 Chips              2020     2     5 05:32:21 PM      
## 3 Amy                2020     3     4 12:09:11 AM      
## 4 Sean_Late          2020     1     4 06:32:21 PM      
## 5 Sean_BarelyOntime  2020     1     4 09:00:00 AM




Filtering

If at this point you want to filter for a specific month and day, you can do so using the filter() function on your dataframe.

How would be filter for all records that have a date before February?

#Give me dates before February
DataYearMonthDay %>%
  filter(Month < 2)
## # A tibble: 3 x 6
##   Name               Year Month   Day Time     DayNight
##   <chr>             <dbl> <dbl> <dbl> <chr>    <chr>   
## 1 Sean_Ontime        2020     1     4 07:32:21 AM      
## 2 Sean_Late          2020     1     4 06:32:21 PM      
## 3 Sean_BarelyOntime  2020     1     4 09:00:00 AM




Two arguments

Now we are leveling up.

You may find that you need more than one argument to satisfy your query. Sometimes it’s easier separating your question into multiple arguments. This makes it easier to do a query. For example let’s say “We want all dates between February 2-10”.

We can separate this into two specific arguments:

  1. Filter days in February
  2. Filter days between 2 and 10

We need our query to match BOTH these arguments. That means this argument should be stated “Give records that match arugment 1 AND arugment two”. In R, the AND operator is “&”.

#Conditional: Give me dates between feburary 2 and 10
DataYearMonthDay %>%
  #Give me only records that are in February AND have day between 2 and 10 
  filter(Month == 2 & Day %in% 2:10)
## # A tibble: 1 x 6
##   Name   Year Month   Day Time     DayNight
##   <chr> <dbl> <dbl> <dbl> <chr>    <chr>   
## 1 Chips  2020     2     5 05:32:21 PM




Separate Time

Now that we have our date separated, let’s also deal with the time.

Currently our time is in the format “HH:MM:SS”. We see that hours, minutes, and seconds are separated by colons. Colons are not a special character, so we can simply separate them like… separate(Data, sep = “:”)

We can break up our time column into three new columns:

  1. Hours
  2. Minutes
  3. Seconds

We will then transform them in numeric vectors since we saw it made things easier for the date!

#Separate time into Hour, minutes seconds
DataHourMinutesSeconds <- DataYearMonthDay %>%
  separate(Time, sep = ":", into = c("Hours", "Minutes", "Seconds")) %>%
  #Change these columns into numeric vectors
  mutate_at(c("Hours", "Minutes", "Seconds"), as.numeric)

head(DataHourMinutesSeconds)
## # A tibble: 5 x 8
##   Name               Year Month   Day Hours Minutes Seconds DayNight
##   <chr>             <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl> <chr>   
## 1 Sean_Ontime        2020     1     4     7      32      21 AM      
## 2 Chips              2020     2     5     5      32      21 PM      
## 3 Amy                2020     3     4    12       9      11 AM      
## 4 Sean_Late          2020     1     4     6      32      21 PM      
## 5 Sean_BarelyOntime  2020     1     4     9       0       0 AM




Conditional filters for time

Time is bit tricker to deal with than the date. It may be easier to use military time for example since you don’t have to worry about AM or PM.

I’m lazy, so I’m not going to convert things to military times, but use many conditional statement to get what I want! It’s not ideal but it lets us play with conditionals.

Let’s say we want to get all records on January 4 before 5pm.

We can break this into four arguments:

  1. Month must be January
  2. Day must be 4
  3. Must be in the AM hours
  4. If not in the AM hours, must be before 5pm

Argument 1 and 2 are clearly an AND statment “&” since both conditions must be met.

Argument 3 and 4 can are an OR statment. The condition is satified simply if the hour is AM OR if the leftover PM hours are before 5pm.

In R the operator for OR is “|”.

DataHourMinutesSeconds %>%
  #Filter records on January 4
  filter(Month == 1 & Day == 4) %>%
  #Must be either in the AM hours OR before 5pm
  filter(DayNight == "AM" | Hours <5)
## # A tibble: 2 x 8
##   Name               Year Month   Day Hours Minutes Seconds DayNight
##   <chr>             <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl> <chr>   
## 1 Sean_Ontime        2020     1     4     7      32      21 AM      
## 2 Sean_BarelyOntime  2020     1     4     9       0       0 AM




Creating New Columns Based On Meeting a Condition

What if we don’t want to just query the results that match our criteria, but want to flag it for later. Perhaps a bad graduate student, Sean, is often late, or barely ontime. We may want to track this in his records….

In this situation we can create a new column called “Lateness” that has a different entry dependent on whether or not it means the condition we are interested in.

Let’s say we want to mark anyone who shows up between 7 and 9am as ontime, and everyone else late. We can create a new column by using the mutate() function. We make the new column “Lateness” equal to a conditional statement that populates the column with one of two possible states “OnTime” or “Late”.

  1. If the DayNight is AM AND 2. The Hours is between 7-8 OR
  2. If the DayNight is AM AND 4. The Minutes is 0

Then we will mark the entry as ontime. Otherwise, we will mark it as late. We code this up as follows..

#Maybe you want to mark everybody as late or ontime depending upon if they show up between 7 and 9AM
NewColumnLATES <- DataHourMinutesSeconds %>%
  mutate(Lateness = if_else(DayNight == "AM" & Hours %in% 7:8 | 
                              DayNight == "AM" & Hours == 9 & Minutes == 0, 
                            #If all coniditions are met, we fill the corresponding element with OnTime
                            "OnTime",
                            #If the conditions are not met, we fill the element with Late
                            "Late"))

head(NewColumnLATES)
## # A tibble: 5 x 9
##   Name               Year Month   Day Hours Minutes Seconds DayNight Lateness
##   <chr>             <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl> <chr>    <chr>   
## 1 Sean_Ontime        2020     1     4     7      32      21 AM       OnTime  
## 2 Chips              2020     2     5     5      32      21 PM       Late    
## 3 Amy                2020     3     4    12       9      11 AM       Late    
## 4 Sean_Late          2020     1     4     6      32      21 PM       Late    
## 5 Sean_BarelyOntime  2020     1     4     9       0       0 AM       OnTime




Hope this was helpful!

-Sean Perez PhD Candidate @ UC Berkeley