Mindy Fang
Jan 08, 2019
In its most basic form, R can be used as a simple calculator. Consider the following arithmetic operators:
+-*/^%%The last two might need some explaining:
^ operator raises the number to its left to the power of the number to its right: for example 3^2 is 9.%% 3 is 2.An addition
## [1] 10
A subtraction
## [1] 0
A multiplication
## [1] 15
## [1] 32
## [1] 4
A basic concept in (statistical) programming is called a variable.
A variable allows you to store a value (e.g. 4) or an object (e.g. a function description) in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable.
You can assign a value 4 to a variable my_var with the command
## [1] 42
You have 5 apples and 6 oranges, now you want to calculate how many pieces of fruits you have in total.
## [1] 11
R works with numerous data types. Some of the most basic types to get started are:
Change my_numeric to be 42
You can check the data type of a variable with the class() function. Declare variables of different types
Check class of my_numeric
## [1] "numeric"
## [1] "character"
## [1] "logical"
Vectors are one-dimension arrays that can hold numeric data, character data, or logical data. In other words, a vector is a simple tool to store data. In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:
You can give a name to the elements of a vector with the names() function.
This code first creates a vector some_vector and then gives the two elements a name. The first element is assigned the name Name, while the second element is labeled Gender. Printing the contents to the console yields following output:
## Name Gender
## "mindy fang" "female"
## Name
## "mindy fang"
## Gender
## "female"
It is important to know that if you sum two vectors in R, it takes the element-wise sum. For example, the following three statements are completely equivalent:
## [1] 5 7 9
## [1] 5 7 9
## [1] 5 7 9
You can also do the calculations with variables that represent vectors:
## [1] 5 7 9
We can select the elements from a vector directly:
Time spent (minutes) in housework + kids for me and my husband
me_housework_time <- c(120.5, 110, 130, 110, 100, 230, 300)
husband_housework_time <- c(0, 0, 10, 10.5, 0, 300, 360)
names(me_housework_time) <- days_vector
names(husband_housework_time) <- days_vectorHow many minutes have I worked on Thursday and Friday?
## Thu Fri
## 110 100
How many minutes has my husband worked on Thursday?
## Thu Fri
## 10.5 0.0
The (logical) comparison operators known to R are:
< for less than> for greater than<= for less than or equal to>= for greater than or equal to== for equal to each other!= not equal to each other## Mon Tue Wed Thu Fri Sat Sun
## TRUE TRUE TRUE TRUE TRUE FALSE FALSE
## Mon Tue Wed Thu Fri
## 120.5 110.0 130.0 110.0 100.0
In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional. You can construct a matrix in R with the matrix() function.
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
In the matrix() function, the first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, 3, 4, 5, 6, 7, 8, 9). The argument byrow indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we just place byrow = FALSE. The third argument nrow indicates that the matrix should have three rows.
we can now combine the two vectors, me_housework_time and husband_housework_time into a matrix.
matrix_housework_time <- matrix(cbind(me_housework_time,
husband_housework_time),ncol=2)
colnames(matrix_housework_time) <- c("Me", "Husband")
rownames(matrix_housework_time) <- days_vector
matrix_housework_time## Me Husband
## Mon 120.5 0.0
## Tue 110.0 0.0
## Wed 130.0 10.0
## Thu 110.0 10.5
## Fri 100.0 0.0
## Sat 230.0 300.0
## Sun 300.0 360.0
sum_housework_time <- matrix_housework_time[,"Me"] +
matrix_housework_time[,"Husband"]
sum_housework_time## Mon Tue Wed Thu Fri Sat Sun
## 120.5 110.0 140.0 120.5 100.0 530.0 660.0
## Mon Tue Wed Thu Fri Sat Sun
## 120.5 110.0 140.0 120.5 100.0 530.0 660.0
Add a new column to the above matrix
newmatrix1_housework_time <- cbind(matrix_housework_time,
sum_housework_time)
newmatrix1_housework_time ## Me Husband sum_housework_time
## Mon 120.5 0.0 120.5
## Tue 110.0 0.0 110.0
## Wed 130.0 10.0 140.0
## Thu 110.0 10.5 120.5
## Fri 100.0 0.0 100.0
## Sat 230.0 300.0 530.0
## Sun 300.0 360.0 660.0
Add a new row to the matrix
newmatrix2_housework_time <- rbind(matrix_housework_time,
colSums(matrix_housework_time))
newmatrix2_housework_time ## Me Husband
## Mon 120.5 0.0
## Tue 110.0 0.0
## Wed 130.0 10.0
## Thu 110.0 10.5
## Fri 100.0 0.0
## Sat 230.0 300.0
## Sun 300.0 360.0
## 1100.5 680.5
Similar to vectors, you can use the square brackets [ ] to select one or multiple elements from a matrix. Whereas vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns. For example:
my_matrix[1,2] selects the element at the first row and second column.my_matrix[1:3,2:4] results in a matrix with the data on the rows 1, 2, 3 and columns 2, 3, 4.If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively:
my_matrix[,1] selects all elements of the first column.my_matrix[1,] selects all elements of the first row.## Me Husband
## Mon 120.5 0.0
## Tue 110.0 0.0
## Wed 130.0 10.0
## Thu 110.0 10.5
## Fri 100.0 0.0
The standard operators like +, -, /, *, etc. work in an element-wise way on matrices in R. Transform the housework time matrix from minutes to hours (and round to 1 decimal place):
## Me Husband
## Mon 2.0 0.0
## Tue 1.8 0.0
## Wed 2.2 0.2
## Thu 1.8 0.2
## Fri 1.7 0.0
## Sat 3.8 5.0
## Sun 5.0 6.0
Note that my_matrix1 * my_matrix2 creates a matrix where each element is the product of the corresponding elements in my_matrix1 and my_matrix2, whilst my_matrix1 %*% my_matrix2 gives the matrix algebra multiplication.
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
## [,1] [,2]
## [1,] 1 9
## [2,] 4 16
## [,1] [,2]
## [1,] 7 15
## [2,] 10 22
The term factor refers to a statistical data type used to store categorical variables. The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values. To create factors in R, you make use of the function factor(). First thing that you have to do is create a vector that contains all the observations that belong to a limited number of categories. For example, sex_vector contains the sex of 5 different individuals:
## [1] "Female" "Male"
## Length Class Mode
## 5 character character
## Female Male
## 2 3
Sometimes you will deal with factors that have a natural ordering between its categories. If this is the case, we have to make sure that we pass this information to R. For example, speed_vector should be converted to an ordinal factor since its categories have a natural ordering. By default, the function factor() transforms speed_vector into an unordered factor. To create an ordered factor, you have to add two additional arguments: ordered and levels.
speed_vector <- c("medium", "slow", "slow", "medium", "fast")
factor_speed_vector <- factor(speed_vector, ordered = TRUE,
levels = c("slow", "medium", "fast"))
factor_speed_vector## [1] medium slow slow medium fast
## Levels: slow < medium < fast
The fact that factor_speed_vector is now ordered enables us to compare different elements.
## [1] FALSE
## [1] TRUE
All the elements that you put in a matrix should be of the same type. You will often find yourself working with data sets that contain different data types instead of only one. A data frame has the variables of a data set as columns and the observations as rows. A data frame can contain variables of different data types. Create a data frame:
dataframe_housework_time <- data.frame(days_vector,
matrix_housework_time,
c("Weekday", "Weekday",
"Weekday", "Weekday",
"Weekday", "Weekend",
"Weekend"))
colnames(dataframe_housework_time) <- c("Day", "Me", "Husband",
"Weekday")
dataframe_housework_time[1,]## Day Me Husband Weekday
## Mon Mon 120.5 0 Weekday
We can select the elements of the data frame as we did with matrices.
## Me Husband
## Mon 120.5 0
## Tue 110.0 0
We can also select the columns by the shortcut:
## [1] 120.5 110.0 130.0 110.0 100.0 230.0 300.0
## Day Me Husband Weekday
## Mon Mon 120.5 0 Weekday
## Tue Tue 110.0 0 Weekday
## Fri Fri 100.0 0 Weekday
## Day Me Husband Weekday
## Sat Sat 230 300 Weekend
## Sun Sun 300 360 Weekend
## Day Me Husband Weekday
## Sat Sat 230 300 Weekend
## Sun Sun 300 360 Weekend
You can sort your data according to a certain variable in the data set. In R, this is done with the help of the function order(). order() is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example:
## [1] 2 1 3
10, which is the second element in a, is the smallest element, so 2 comes first in the output of order(a). 100, which is the first element in a is the second smallest element, so 1 comes second in the output of order(a). This means we can use the output of order(a) to reshuffle a:
## [1] 10 100 1000
You can rearrange a data frame according to a particular column. For example, let us rearrange the housework time data frame according to “Me”.
## Day Me Husband Weekday
## Fri Fri 100.0 0.0 Weekday
## Tue Tue 110.0 0.0 Weekday
## Thu Thu 110.0 10.5 Weekday
## Mon Mon 120.5 0.0 Weekday
## Wed Wed 130.0 10.0 Weekday
## Sat Sat 230.0 300.0 Weekend
## Sun Sun 300.0 360.0 Weekend
## [1] Fri Tue Thu Mon Wed Sat Sun
## Levels: Fri Mon Sat Sun Thu Tue Wed
Why do we need lists? Let us do a quick recap of what we have known.
A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.
To construct a list you use the function list():
## [[1]]
## [1] 1 2
##
## [[2]]
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
The arguments to the list function are the list components. These components can be matrices, vectors, other lists, etc. You can also name the components in your list:
## $name1
## [1] 1 2
##
## $name2
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
One way to select a component is using the numbered position of that component.
## [1] 1 2
You can also refer to the names of the components, with [[ ]] or with the $ sign.
## [1] 1 2
## [1] 1 2
You can add more components to an exisiting list by c().
## $name1
## [1] 1 2
##
## $name2
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## [[3]]
## [1] "new contents"
The following statements all evaluate to TRUE. Notice from the last expression that R is case sensitive: M is not equal to m.
## [1] TRUE
## [1] TRUE
## [1] TRUE
More comparisons
## [1] FALSE
## [1] FALSE
## [1] TRUE
## [1] TRUE
You can also add an equal sign to express less than or equal to or greater than or equal to, respectively. Have a look at the following R expressions, that all evaluate to FALSE:
## [1] FALSE
## [1] FALSE
More comparisons
## [1] TRUE
## [1] TRUE
## [1] TRUE
Without having to change anything about the syntax, R’s relational operators also work on vectors.
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
## [1] TRUE TRUE FALSE FALSE TRUE FALSE FALSE
## [1] 3
## Me Husband
## Mon FALSE FALSE
## Tue FALSE FALSE
## Wed FALSE FALSE
## Thu FALSE FALSE
## Fri FALSE FALSE
## Sat FALSE FALSE
## Sun FALSE TRUE
Like relational operators, logical operators work perfectly fine with vectors and matrices.
dataframe_housework_time$Me > dataframe_housework_time$Husband &
dataframe_housework_time$Weekday == "Weekday"## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
We can reverse the result by using !.
## [1] FALSE
## [1] FALSE
for(i in 1:7){
if(dataframe_housework_time$Husband[i]==0){
print(paste("Wife is upset on", dataframe_housework_time$Day[i]))
}
}## [1] "Wife is upset on Mon"
## [1] "Wife is upset on Tue"
## [1] "Wife is upset on Fri"
We can also add an else statement:
for(i in 1:7){
if(dataframe_housework_time$Husband[i]==0){
print(paste("Wife is upset on", dataframe_housework_time$Day[i],
"because husband doesn't help"))
}
else{
print(paste("Husband has help on", dataframe_housework_time$Day[i]))
}
}## [1] "Wife is upset on Mon because husband doesn't help"
## [1] "Wife is upset on Tue because husband doesn't help"
## [1] "Husband has help on Wed"
## [1] "Husband has help on Thu"
## [1] "Wife is upset on Fri because husband doesn't help"
## [1] "Husband has help on Sat"
## [1] "Husband has help on Sun"
You can add as many else if statements as you like.
for(i in 1:7){
if(dataframe_housework_time$Husband[i]==0){
print(paste("Wife is upset on", dataframe_housework_time$Day[i],
"because husband doesn't help"))
}
else if(dataframe_housework_time$Husband[i] < 30){
print(paste("Husband has helped very little on",
dataframe_housework_time$Day[i]))
}
else {
print(paste("Husband has helped on",
dataframe_housework_time$Day[i]))
}
}## [1] "Wife is upset on Mon because husband doesn't help"
## [1] "Wife is upset on Tue because husband doesn't help"
## [1] "Husband has helped very little on Wed"
## [1] "Husband has helped very little on Thu"
## [1] "Wife is upset on Fri because husband doesn't help"
## [1] "Husband has helped on Sat"
## [1] "Husband has helped on Sun"
## [1] "Mon"
## [1] "Tue"
## [1] "Wed"
## [1] "Thu"
## [1] "Fri"
## [1] "Sat"
## [1] "Sun"
for(i in 1:7){
print(dataframe_housework_time$Husband[i])
if(dataframe_housework_time$Husband[i] > 10) break
}## [1] 0
## [1] 0
## [1] 10
## [1] 10.5
Let us simulate the interaction between a driver and a driver’s assistant: When the speed was too high, “Slow down!” got printed out to the console, resulting in a decrease of your speed. The initial speed is 64.
speed <- 64
while (speed > 30) {
print(paste("Your speed is",speed))
if (speed > 48) {
print("Slow down big time!"); speed = speed - 11
} else {
print("Slow down!"); speed = speed - 6
}
}## [1] "Your speed is 64"
## [1] "Slow down big time!"
## [1] "Your speed is 53"
## [1] "Slow down big time!"
## [1] "Your speed is 42"
## [1] "Slow down!"
## [1] "Your speed is 36"
## [1] "Slow down!"
All the relevant details such as a description, usage, and arguments for a function can be found in the documentation. For example, you can use one of following R commands:
help(mean)
You can also inspect the arguments of the mean() function by
args(mean)
Use the function
## [1] 157.2143
## [1] 680.5
## [1] 120.5
Here is a function template:
my_fun <- function(arg1, arg2...) {
body
}
Creating a function in R basically is the assignment of a function object to a variable. In the recipe above, you’re creating a new R variable my_fun, that becomes available in the workspace as soon as you execute the definition. From then on, you can use the my_fun as a function.
## [1] 20
## [1] 40
From the lapply() help document, the usage section shows the following expression:
lapply(X, FUN, ...)
lapply takes a vector or list X, and applies the function FUN to each of its members. If FUN requires additional arguments, you pass them after you’ve specified X and FUN (...). The output of lapply() is a list, the same length as X, where each element is the result of applying FUN on the corresponding element of X. Here is an exercise to apply our self-defined funcion my_fun1 on the Husband’s housework time vector.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0 0 20 21 0 600 720
You can also put the function directly into the lapply(). The above is the same as the following:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0 0 20 21 0 600 720
lapply() provides a way to handle functions that require more than one argument, for example:
my_fun3 <- function(x, factor){
x > factor
}
lapply(dataframe_housework_time$Husband, my_fun3, factor=0)## [[1]]
## [1] FALSE
##
## [[2]]
## [1] FALSE
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] TRUE
##
## [[5]]
## [1] FALSE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
You can use sapply() similar to how you used lapply(). The first argument of sapply() is the list or vector X over which you want to apply a function, FUN. Potential additional arguments to this function are specified afterwards (...):
sapply(X, FUN, ...)
## $Me
## [1] 300
##
## $Husband
## [1] 360
## Me Husband
## 300 360
Like lapply(), sapply() allows you to use self-defined functions and apply them over a vector or a list:
## Me Husband
## [1,] 241 0
## [2,] 220 0
## [3,] 260 20
## [4,] 220 21
## [5,] 200 0
## [6,] 460 600
## [7,] 600 720
What if the function you’re applying over a list or a vector returns a vector of length greater than 1? For example, now we are to define an extremes() function. It takes a vector of numerical values and returns a vector containing the minimum and maximum values of a given vector, with the names “min” and “max”, respectively.
extremes <- function(x) {
c(min = min(x), max = max(x))
}
sapply(dataframe_housework_time[,c("Me", "Husband")], extremes)## Me Husband
## min 100 0
## max 300 360
## $Me
## min max
## 100 300
##
## $Husband
## min max
## 0 360
The function vapply() has the following syntax:
vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
Over the elements inside X, the function FUN is applied. The FUN.VALUE argument expects a template for the return argument of this function FUN. USE.NAMES is TRUE by default.
basics <- function(x) {
c(min = min(x), mean = mean(x), median = median(x), max = max(x))
}
vapply(dataframe_housework_time[,c("Me", "Husband")], basics, numeric(4))## Me Husband
## min 100.0000 0.00000
## mean 157.2143 97.21429
## median 120.5000 10.00000
## max 300.0000 360.00000
Another example:
basics <- function(x) {
c(min = min(x), mean = mean(x), median = median(x), max = max(x))
}
vapply(dataframe_housework_time[,c("Me", "Husband")],
function(x,y){min(x)>y}, y=0, logical(1))## Me Husband
## TRUE FALSE
Here are some useful math functions that R features:
abs(): Calculate the absolute value.sum(): Calculate the sum of all the values in a data structure.mean(): Calculate the arithmetic mean. (Similarly min(), max(), median())round(): Round the values to 0 decimal places by default. Try out ?round in the console for variations of round() and ways to change the number of digits to round to.## [1] 2 -3 4 -10 -3 7
## [1] 29
R features a bunch of functions to juggle around with data structures::
seq(): Generate sequences, by specifying the from, to, and by arguments.rep(): Replicate elements of vectors and lists.sort(): Sort a vector in ascending order. Works on numerics, but also on character strings and logicals.rev(): Reverse the elements in a data structures for which reversal is defined.str(): Display the structure of any R object.append(): Merge vectors or lists.is.*(): Check for the class of an R object.as.*(): Convert an R object from one class to another.unlist(): Flatten (possibly embedded) lists to produce a vector.In their most basic form, regular expressions can be used to see whether a pattern exists inside a character string or a vector of character strings. For this purpose, you can use:
grepl(), which returns TRUE when a pattern is found in the corresponding character string.grep(), which returns a vector of indices of the character strings that contains the pattern. Both functions need a pattern and an x argument, where pattern is the regular expression you want to match for, and the x argument is the character vector from which matches should be sought.emails <- c("john.doe@ivyleague.edu", "education@world.gov",
"dalai.lama@peace.org", "invalid.edu",
"quant@bigdatacollege.edu", "cookie.monster@sesame.tv")
grepl(pattern = "edu", x = emails)## [1] TRUE TRUE FALSE TRUE TRUE FALSE
## [1] "john.doe@ivyleague.edu" "education@world.gov"
## [3] "invalid.edu" "quant@bigdatacollege.edu"
You can use the caret, ^, and the dollar sign, $ to match the content located in the start and end of a string, respectively. This could take us one step closer to a correct pattern for matching only the .edu email addresses from our list of emails. But there’s more that can be added to make the pattern more robust:
@, because a valid email must contain an at-sign..* , which matches any character (.) zero or more times (*). Both the dot and the asterisk are metacharacters. You can use them to match any character between the at-sign and the .edu portion of an email address.\\.edu , to match the .edu part of the email at the end of the string. The \\ part escapes the dot: it tells R that you want to use the . as an actual character.## [1] TRUE FALSE FALSE FALSE TRUE FALSE
## [1] "john.doe@ivyleague.edu" "quant@bigdatacollege.edu"
While grep() and grepl() were used to simply check whether a regular expression could be matched with a character vector, sub() and gsub() take it one step further: you can specify a replacement argument. If inside the character vector x, the regular expression pattern is found, the matching element(s) will be replaced with replacement. sub() only replaces the first match, whereas gsub() replaces all matches.
## [1] "john.doe@kuhs.ac.jp" "education@world.gov"
## [3] "dalai.lama@peace.org" "invalid.edu"
## [5] "quant@kuhs.ac.jp" "cookie.monster@sesame.tv"
Regular expressions are a typical concept that you’ll learn by doing and by seeing other examples. Before you rack your brains over the regular expression in this exercise, have a look at the new things that will be used:
.*: A usual suspect! It can be read as “any character that is matched zero or more times”.\\s : Match a space. The “s” is normally a character, escaping it (\\) makes it a metacharacter.[0-9]+: Match the numbers 0 to 9, at least once (+).([0-9]+): The parentheses are used to make parts of the matching string available to define the replacement. The \\1 in the replacement argument of sub() gets set to the string that is captured by the regular expression [0-9]+.awards <- c("Won 1 Oscar.",
"Won 1 Oscar. Another 9 wins & 24 nominations.",
"1 win and 2 nominations.",
"2 wins & 3 nominations.",
"Nominated for 2 Golden Globes. 1 more win & 2 nominations.",
"4 wins & 1 nomination.")
sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards)## [1] "Won 1 Oscar." "24" "2" "3"
## [5] "2" "1"
The ([0-9]+) selects the entire number that comes before the word “nomination” in the string, and the entire match gets replaced by this number because of the \\1 that reference to the content inside the parentheses.
In R, dates are represented by Date objects, while times are represented by POSIXct objects. Under the hood, however, these dates and times are simple numerical values. Get the current date: today
today <- Sys.Date()
today
See what today looks like under the hood
unclass(today)
now <- Sys.time()
now
See what now looks like under the hood
unclass(now)
To create a Date object from a simple character string in R, you can use the as.Date() function. The character string has to obey a format that can be defined using a set of symbols (the examples correspond to 13 January, 1982):
Y: 4-digit year (1982)%y: 2-digit year (82)%m: 2-digit month (01)%d: 2-digit day of the month (13)%A: weekday (Wednesday)%a: abbreviated weekday (Wed)%B: month (January)%b: abbreviated month (Jan)The following R commands will all create the same Date object for the 13th day in January of 1982:
as.Date("1982-01-13")
as.Date("Jan1382", "%b%d%y", locale = "japanese")
as.Date("13 January, 1982", format = "%d %B, %Y")
x <- c("1jan1960", "2jan1960", "31mar1960", "30jul1960")
z <- as.Date(x, "%d%b%Y")
as.Date("20150905", format = "%Y%m%d")
In addition to creating dates, you can also convert dates to character strings that use a different date notation. For this, you use the format() function.
today <- Sys.Date()
format(Sys.Date(), "%d %B, %Y")
format(Sys.Date(), format = "Today is a %A!")
Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"
Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, format = "%b %d, '%y")
date2 <- as.Date(str2, format = "%Y-%m-%d")
date3 <- as.Date(str3, format = "%d/%B/%Y")
Convert dates to formatted strings
format(date1, "%A")
format(date2, "%d")
format(date3, "%b %Y")
Similar to working with dates, you can use as.POSIXct() to convert from a character string to a POSIXct object, and format() to convert from a POSIXct object to a character string. Again, you have a wide variety of symbols:
%H: hours as a decimal number (00-23)%I: hours as a decimal number (01-12)%M: minutes as a decimal number%S: seconds as a decimal number%T: shorthand notation for the typical format %H:%M:%S%p: AM/PM indicatorDefinition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"
Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1,
format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
time2 <- as.POSIXct(str2, format = "%Y-%m-%d %H:%M:%S")
#format(time1, "%M")
format(time2, "%I:%M %p")
Both Date and POSIXct R objects are represented by simple numerical values under the hood. This makes calculation with time and date objects very straightforward: R performs the calculations using the underlying numerical values, and then converts the result back to human-readable time information again. You can increment and decrement Date objects, or do actual calculations with them
today <- Sys.Date()
today + 1
today - 1
as.Date("2015-03-12") - as.Date("2015-02-27")
day1 <- as.Date("2018-08-15")
day2 <- as.Date("2018-08-17")
day3 <- as.Date("2018-08-22")
day4 <- as.Date("2018-08-28")
day5 <- as.Date("2018-09-02")
as.Date(day5) - as.Date(day1)
daylist <- c(day1, day2, day3, day4, day5)
day_diff <- diff(daylist)
mean(day_diff)
Calculations using POSIXct objects are completely analogous to those using Date objects.
now <- Sys.time()
now + 3600
now - 3600 * 24
login and logout time
login <- as.POSIXct(c("2018-08-19 10:18:04 UTC", "2018-08-24 09:14:18 UTC",
"2018-08-24 12:21:51 UTC", "2018-08-24 12:37:24 UTC",
"2018-08-26 21:37:55 UTC"))
logout <- as.POSIXct(c("2018-08-19 10:56:29 UTC", "2018-08-24 09:14:52 UTC",
"2018-08-24 12:35:48 UTC", "2018-08-24 13:17:22 UTC",
"2018-08-26 22:08:47 UTC"))
time_online = logout - login
time_online
mean(time_online)