Mathematical utilities

Have another look at some useful math functions that R features:

abs(): Calculate the absolute value. sum(): Calculate the sum of all the values in a data structure. mean(): Calculate the arithmetic mean. round(): Round the values to 0 decimal places by default. Try out ?round in the console for variations of round() and ways to change the number of digits to round to. As a data scientst in training, you’ve estimated a regression model on the sales data for the past six months. After evaluating your model, you see that the training error of your model is quite regular, showing both positive and negative values. The error values are already defined in the workspace on the right (errors).

INSTRUCTIONS

Calculate the sum of the absolute rounded values of the training errors. You can work in parts, or with a single one-liner. There’s no need to store the result in a variable, just have R print it.

HINT To know the order of operations, you should read the sentence in the instructions backwards: put round() inside abs(), and put the result of abs() in sum().
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSBlcnJvcnMgdmVjdG9yIGhhcyBhbHJlYWR5IGJlZW4gZGVmaW5lZCBmb3IgeW91XG5lcnJvcnMgPC0gYygxLjksIC0yLjYsIDQuMCwgLTkuNSwgLTMuNCwgNy4zKVxuXG4jIFN1bSBvZiBhYnNvbHV0ZSByb3VuZGVkIHZhbHVlcyBvZiBlcnJvcnMiLCJzb2x1dGlvbiI6IiMgVGhlIGVycm9ycyB2ZWN0b3IgaGFzIGFscmVhZHkgYmVlbiBkZWZpbmVkIGZvciB5b3VcbmVycm9ycyA8LSBjKDEuOSwgLTIuNiwgNC4wLCAtOS41LCAtMy40LCA3LjMpXG5cbiMgU3VtIG9mIGFic29sdXRlIHJvdW5kZWQgdmFsdWVzIG9mIGVycm9yc1xuc3VtKHJvdW5kKGFicyhlcnJvcnMpKSkifQ==

Find the error

We went ahead and included some code on the right, but there’s still an error. Can you trace it and fix it?

In times of despair, help with functions such as sum() and rev() are a single command away; simply use ?sum and ?rev in the console.

INSTRUCTIONS

Fix the error by including code on the last line. Remember: you want to call mean() only once!

HINT mean() takes a vector of numerical values, so make sure that abs(vec1) and abs(vec2) are elements of a vector: c(abs(vec1), ___).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERvbid0IGVkaXQgdGhlc2UgdHdvIGxpbmVzXG52ZWMxIDwtIGMoMS41LCAyLjUsIDguNCwgMy43LCA2LjMpXG52ZWMyIDwtIHJldih2ZWMxKVxuXG4jIEZpeCB0aGUgZXJyb3Jcbm1lYW4oYWJzKHZlYzEpLCBhYnModmVjMikpIiwic29sdXRpb24iOiIjIERvbid0IGVkaXQgdGhlc2UgdHdvIGxpbmVzXG52ZWMxIDwtIGMoMS41LCAyLjUsIDguNCwgMy43LCA2LjMpXG52ZWMyIDwtIHJldih2ZWMxKVxuXG4jIEZpeCB0aGUgZXJyb3Jcbm1lYW4oYyhhYnModmVjMSksIGFicyh2ZWMyKSkpIn0=

If you check out the documentation of mean(), you’ll see that only the first argument, x, should be a vector. If you also specify a second argument, R will match the arguments by position and expect a specification of the trim argument. Therefore, merging the two vectors is a must!

Data Utilities

R features a bunch of functions to juggle around with data structures::

seq(): Generate sequences, by specifying the from, to, and by arguments. rep(): Replicate elements of vectors and lists. sort(): Sort a vector in ascending order. Works on numerics, but also on character strings and logicals. rev(): Reverse the elements in a data structures for which reversal is defined. str(): Display the structure of any R object. append(): Merge vectors or lists. is.(): Check for the class of an R object. as.(): Convert an R object from one class to another. unlist(): Flatten (possibly embedded) lists to produce a vector. Remember the social media profile views data? Your LinkedIn and Facebook view counts for the last seven days are already defined as lists on the right.

INSTRUCTIONS

Convert both linkedin and facebook lists to a vector, and store them as li_vec and fb_vec respectively. Next, append fb_vec to the li_vec (Facebook data comes last). Save the result as social_vec. Finally, sort social_vec from high to low. Print the resulting vector.

HINT You can use unlist() to convert lists to vectors. For the last instruction, make sure to use the decreasing argument!
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSBsaW5rZWRpbiBhbmQgZmFjZWJvb2sgbGlzdHMgaGF2ZSBhbHJlYWR5IGJlZW4gY3JlYXRlZCBmb3IgeW91XG5saW5rZWRpbiA8LSBsaXN0KDE2LCA5LCAxMywgNSwgMiwgMTcsIDE0KVxuZmFjZWJvb2sgPC0gbGlzdCgxNywgNywgNSwgMTYsIDgsIDEzLCAxNClcblxuIyBDb252ZXJ0IGxpbmtlZGluIGFuZCBmYWNlYm9vayB0byBhIHZlY3RvcjogbGlfdmVjIGFuZCBmYl92ZWNcblxuXG5cbiMgQXBwZW5kIGZiX3ZlYyB0byBsaV92ZWM6IHNvY2lhbF92ZWNcblxuXG4jIFNvcnQgc29jaWFsX3ZlYyIsInNvbHV0aW9uIjoiIyBUaGUgbGlua2VkaW4gYW5kIGZhY2Vib29rIGxpc3RzIGhhdmUgYWxyZWFkeSBiZWVuIGNyZWF0ZWQgZm9yIHlvdVxubGlua2VkaW4gPC0gbGlzdCgxNiwgOSwgMTMsIDUsIDIsIDE3LCAxNClcbmZhY2Vib29rIDwtIGxpc3QoMTcsIDcsIDUsIDE2LCA4LCAxMywgMTQpXG5cbiMgQ29udmVydCBsaW5rZWRpbiBhbmQgZmFjZWJvb2sgdG8gYSB2ZWN0b3I6IGxpX3ZlYyBhbmQgZmJfdmVjXG5saV92ZWMgPC0gdW5saXN0KGxpbmtlZGluKVxuZmJfdmVjIDwtIHVubGlzdChmYWNlYm9vaylcblxuIyBBcHBlbmQgZmJfdmVjIHRvIGxpX3ZlYzogc29jaWFsX3ZlY1xuc29jaWFsX3ZlYyA8LSBhcHBlbmQobGlfdmVjLCBmYl92ZWMpXG5cbiMgU29ydCBzb2NpYWxfdmVjXG5zb3J0KHNvY2lhbF92ZWMsIGRlY3JlYXNpbmcgPSBUUlVFKSJ9

These instructions required you to solve this challenge in a step-by-step approach. If you’re comfortable with the functions, you can combine some of these steps into powerful one-liners.

Find the error (2) Just as before, let’s switch roles. It’s up to you to see what unforgivable mistakes we’ve made. Go fix them!

INSTRUCTIONS

Correct the expression. Make sure that your fix still uses the functions rep() and seq().

HINT The use of seq() and rep() are mixed up.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEZpeCBtZVxuc2VxKHJlcCgxLCA3LCBieSA9IDIpLCB0aW1lcyA9IDcpIiwic29sdXRpb24iOiIjIEZpeCBtZVxucmVwKHNlcSgxLCA3LCBieSA9IDIpLCB0aW1lcyA9IDcpIn0=

Debugging code is also a big part of the daily routine of a data scientist, and you seem to be great at it!

Beat Gauss using R

There is a popular story about young Gauss. As a pupil, he had a lazy teacher who wanted to keep the classroom busy by having them add up the numbers 1 to 100. Gauss came up with an answer almost instantaneously, 5050. On the spot, he had developed a formula for calculating the sum of an arithmetic series. There are more general formulas for calculating the sum of an arithmetic series with different starting values and increments. Instead of deriving such a formula, why not use R to calculate the sum of a sequence?

INSTRUCTIONS Using the function seq(), create a sequence that ranges from 1 to 500 in increments of 3. Assign the resulting vector to a variable seq1. Again with the function seq(), create a sequence that ranges from 1200 to 900 in increments of -7. Assign it to a variable seq2. Calculate the total sum of the sequences, either by using the sum() function twice and adding the two results, or by first concatenating the sequences and then using the sum() function once. Print the result to the console.

HINT For the first seq() call, set the from, to and by arguments to 1, 500 and 3 respectively. sum(seq1) + sum(seq2) will do the trick once you’ve correctly created seq1 and seq2.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENyZWF0ZSBmaXJzdCBzZXF1ZW5jZTogc2VxMVxuXG5cbiMgQ3JlYXRlIHNlY29uZCBzZXF1ZW5jZTogc2VxMlxuXG5cbiMgQ2FsY3VsYXRlIHRvdGFsIHN1bSBvZiB0aGUgc2VxdWVuY2VzIiwic29sdXRpb24iOiIjIENyZWF0ZSBmaXJzdCBzZXF1ZW5jZTogc2VxMVxuc2VxMSA8LSBzZXEoMSwgNTAwLCBieSA9IDMpXG5cbiMgQ3JlYXRlIHNlY29uZCBzZXF1ZW5jZTogc2VxMlxuc2VxMiA8LSBzZXEoMTIwMCwgOTAwLCBieSA9IC03KVxuXG4jIENhbGN1bGF0ZSB0b3RhbCBzdW0gb2YgdGhlIHNlcXVlbmNlc1xuc3VtKGMoc2VxMSwgc2VxMikpIn0=

grepl & grep

In their most basic form, regular expressions can be used to see whether a pattern exists inside a character string or a vector of character strings. For this purpose, you can use:

grepl(), which returns TRUE when a pattern is found in the corresponding character string. grep(), which returns a vector of indices of the character strings that contains the pattern. Both functions need a pattern and an x argument, where pattern is the regular expression you want to match for, and the x argument is the character vector from which matches should be sought.

In this and the following exercises, you’ll be querying and manipulating a character vector of email addresses! The vector emails has already been defined in the editor on the right so you can begin with the instructions straight away!

INSTRUCTIONS

Use grepl() to generate a vector of logicals that indicates whether these email addressess contain “edu”. Print the result to the output. Do the same thing with grep(), but this time save the resulting indexes in a variable hits. Use the variable hits to select from the emails vector only the emails that contain “edu”.

HINT The first argument you should pass to both grepl() and grep(), is the pattern: “edu”. The second argument is emails, in both cases. Once you’ve correctly defined hits, you can use it to subset emails: emails[hits].

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSBlbWFpbHMgdmVjdG9yIGhhcyBhbHJlYWR5IGJlZW4gZGVmaW5lZCBmb3IgeW91XG5lbWFpbHMgPC0gYyhcImpvaG4uZG9lQGl2eWxlYWd1ZS5lZHVcIiwgXCJlZHVjYXRpb25Ad29ybGQuZ292XCIsIFwiZGFsYWkubGFtYUBwZWFjZS5vcmdcIixcbiAgICAgICAgICAgIFwiaW52YWxpZC5lZHVcIiwgXCJxdWFudEBiaWdkYXRhY29sbGVnZS5lZHVcIiwgXCJjb29raWUubW9uc3RlckBzZXNhbWUudHZcIilcblxuIyBVc2UgZ3JlcGwoKSB0byBtYXRjaCBmb3IgXCJlZHVcIlxuXG5cbiMgVXNlIGdyZXAoKSB0byBtYXRjaCBmb3IgXCJlZHVcIiwgc2F2ZSByZXN1bHQgdG8gaGl0c1xuXG5cbiMgU3Vic2V0IGVtYWlscyB1c2luZyBoaXRzIiwic29sdXRpb24iOiIjIFRoZSBlbWFpbHMgdmVjdG9yIGhhcyBhbHJlYWR5IGJlZW4gZGVmaW5lZCBmb3IgeW91XG5lbWFpbHMgPC0gYyhcImpvaG4uZG9lQGl2eWxlYWd1ZS5lZHVcIiwgXCJlZHVjYXRpb25Ad29ybGQuZ292XCIsIFwiZGFsYWkubGFtYUBwZWFjZS5vcmdcIixcbiAgICAgICAgICAgIFwiaW52YWxpZC5lZHVcIiwgXCJxdWFudEBiaWdkYXRhY29sbGVnZS5lZHVcIiwgXCJjb29raWUubW9uc3RlckBzZXNhbWUudHZcIilcblxuIyBVc2UgZ3JlcGwoKSB0byBtYXRjaCBmb3IgXCJlZHVcIlxuZ3JlcGwoXCJlZHVcIiwgZW1haWxzKVxuXG4jIFVzZSBncmVwKCkgdG8gbWF0Y2ggZm9yIFwiZWR1XCIsIHNhdmUgcmVzdWx0IHRvIGhpdHNcbmhpdHMgPC0gZ3JlcChcImVkdVwiLCBlbWFpbHMpXG5cbiMgU3Vic2V0IGVtYWlscyB1c2luZyBoaXRzXG5lbWFpbHNbaGl0c10ifQ==

You can probably guess what we’re trying to achieve here: select all the emails that end with “.edu”. However, the strings education@world.gov and invalid.edu were also matched. Let’s see in the next exercise what you can do to improve our pattern and remove these false positives.

grepl & grep (2) You can use the caret, ^, and the dollar sign, $ to match the content located in the start and end of a string, respectively. This could take us one step closer to a correct pattern for matching only the “.edu” email addresses from our list of emails. But there’s more that can be added to make the pattern more robust:

@, because a valid email must contain an at-sign. ., which matches any character (.) zero or more times (). Both the dot and the asterisk are metacharacters. You can use them to match any character between the at-sign and the “.edu” portion of an email address. \.edu$, to match the “.edu” part of the email at the end of the string. The \ part escapes the dot: it tells R that you want to use the . as an actual character. INSTRUCTIONS 70 XP Use grepl() with the more advanced regular expression to return a logical vector. Simply print the result. Do a similar thing with grep() to create a vector of indices. Store the result in the variable hits. Use emails[hits] again to subset the emails vector.

HINT Use the pattern “@.*\.edu$" inside grepl() and grep().

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSBlbWFpbHMgdmVjdG9yIGhhcyBhbHJlYWR5IGJlZW4gZGVmaW5lZCBmb3IgeW91XG5lbWFpbHMgPC0gYyhcImpvaG4uZG9lQGl2eWxlYWd1ZS5lZHVcIiwgXCJlZHVjYXRpb25Ad29ybGQuZ292XCIsIFwiZGFsYWkubGFtYUBwZWFjZS5vcmdcIixcbiAgICAgICAgICAgIFwiaW52YWxpZC5lZHVcIiwgXCJxdWFudEBiaWdkYXRhY29sbGVnZS5lZHVcIiwgXCJjb29raWUubW9uc3RlckBzZXNhbWUudHZcIilcblxuIyBVc2UgZ3JlcGwoKSB0byBtYXRjaCBmb3IgLmVkdSBhZGRyZXNzZXMgbW9yZSByb2J1c3RseVxuXG5cbiMgVXNlIGdyZXAoKSB0byBtYXRjaCBmb3IgLmVkdSBhZGRyZXNzZXMgbW9yZSByb2J1c3RseSwgc2F2ZSByZXN1bHQgdG8gaGl0c1xuXG5cbiMgU3Vic2V0IGVtYWlscyB1c2luZyBoaXRzIiwic29sdXRpb24iOiIjIFRoZSBlbWFpbHMgdmVjdG9yIGhhcyBhbHJlYWR5IGJlZW4gZGVmaW5lZCBmb3IgeW91XG5lbWFpbHMgPC0gYyhcImpvaG4uZG9lQGl2eWxlYWd1ZS5lZHVcIiwgXCJlZHVjYXRpb25Ad29ybGQuZ292XCIsIFwiZGFsYWkubGFtYUBwZWFjZS5vcmdcIixcbiAgICAgICAgICAgIFwiaW52YWxpZC5lZHVcIiwgXCJxdWFudEBiaWdkYXRhY29sbGVnZS5lZHVcIiwgXCJjb29raWUubW9uc3RlckBzZXNhbWUudHZcIilcblxuIyBVc2UgZ3JlcGwoKSB0byBtYXRjaCBmb3IgLmVkdSBhZGRyZXNzZXMgbW9yZSByb2J1c3RseVxuZ3JlcGwocGF0dGVybiA9IFwiQC4qXFxcXC5lZHUkXCIsIHggPSBlbWFpbHMpXG5cbiMgVXNlIGdyZXAoKSB0byBtYXRjaCBmb3IgLmVkdSBhZGRyZXNzZXMgbW9yZSByb2J1c3RseSwgc2F2ZSByZXN1bHQgdG8gaGl0c1xuaGl0cyA8LSBncmVwKHBhdHRlcm4gPSBcIkAuKlxcXFwuZWR1JFwiLCB4ID0gZW1haWxzKVxuXG4jIFN1YnNldCBlbWFpbHMgdXNpbmcgaGl0c1xuZW1haWxzW2hpdHNdIn0=

A careful construction of our regular expression leads to more meaningful matches. However, even our robust email selector will often match some incorrect email addresses (for instance kiara@@fakemail.edu). Let’s not worry about this too much and continue with sub() and gsub() to actually edit the email addresses!

sub & gsub

While grep() and grepl() were used to simply check whether a regular expression could be matched with a character vector, sub() and gsub() take it one step further: you can specify a replacement argument. If inside the character vector x, the regular expression pattern is found, the matching element(s) will be replaced with replacement.sub() only replaces the first match, whereas gsub() replaces all matches.

Suppose that emails vector you’ve been working with is an excerpt of DataCamp’s email database. Why not offer the owners of the .edu email addresses a new email address on the datacamp.edu domain? This could be quite a powerful marketing stunt: Online education is taking over traditional learning institutions! Convert your email and be a part of the new generation!

INSTRUCTIONS

With the advanced regular expression “@.*\.edu$“, use sub() to replace the match with”@datacamp.edu“. Since there will only be one match per character string, gsub() is not necessary here. Inspect the resulting output.

HINT The replacement argument should be equal to “@datacamp.edu” in your call of sub().

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSBlbWFpbHMgdmVjdG9yIGhhcyBhbHJlYWR5IGJlZW4gZGVmaW5lZCBmb3IgeW91XG5lbWFpbHMgPC0gYyhcImpvaG4uZG9lQGl2eWxlYWd1ZS5lZHVcIiwgXCJlZHVjYXRpb25Ad29ybGQuZ292XCIsIFwiZ2xvYmFsQHBlYWNlLm9yZ1wiLFxuICAgICAgICAgICAgXCJpbnZhbGlkLmVkdVwiLCBcInF1YW50QGJpZ2RhdGFjb2xsZWdlLmVkdVwiLCBcImNvb2tpZS5tb25zdGVyQHNlc2FtZS50dlwiKVxuXG4jIFVzZSBzdWIoKSB0byBjb252ZXJ0IHRoZSBlbWFpbCBkb21haW5zIHRvIGRhdGFjYW1wLmVkdSIsInNvbHV0aW9uIjoiIyBUaGUgZW1haWxzIHZlY3RvciBoYXMgYWxyZWFkeSBiZWVuIGRlZmluZWQgZm9yIHlvdVxuZW1haWxzIDwtIGMoXCJqb2huLmRvZUBpdnlsZWFndWUuZWR1XCIsIFwiZWR1Y2F0aW9uQHdvcmxkLmdvdlwiLCBcImdsb2JhbEBwZWFjZS5vcmdcIixcbiAgICAgICAgICAgIFwiaW52YWxpZC5lZHVcIiwgXCJxdWFudEBiaWdkYXRhY29sbGVnZS5lZHVcIiwgXCJjb29raWUubW9uc3RlckBzZXNhbWUudHZcIilcblxuIyBVc2Ugc3ViKCkgdG8gY29udmVydCB0aGUgZW1haWwgZG9tYWlucyB0byBkYXRhY2FtcC5lZHVcbnN1YihwYXR0ZXJuID0gXCJALipcXFxcLmVkdSRcIiwgcmVwbGFjZW1lbnQgPSBcIkBkYXRhY2FtcC5lZHVcIiwgeCA9IGVtYWlscykifQ==

Notice how only the valid .edu addresses are changed while the other emails remain unchanged. To get a taste of other things you can accomplish with regex, head over to the next exercise.

sub & gsub (2) Regular expressions are a typical concept that you’ll learn by doing and by seeing other examples. Before you rack your brains over the regular expression in this exercise, have a look at the new things that will be used:

.*: A usual suspect! It can be read as “any character that is matched zero or more times”. \s: Match a space. The “s” is normally a character, escaping it (\) makes it a metacharacter. [0-9]+: Match the numbers 0 to 9, at least once (+). ([0-9]+): The parentheses are used to make parts of the matching string available to define the replacement. The \1 in the replacement argument of sub() gets set to the string that is captured by the regular expression [0-9]+. awards <- c(“Won 1 Oscar.”, “Won 1 Oscar. Another 9 wins & 24 nominations.”, “1 win and 2 nominations.”, “2 wins & 3 nominations.”, “Nominated for 2 Golden Globes. 1 more win & 2 nominations.”, “4 wins & 1 nomination.”)

sub(“.\s([0-9]+)\snomination.$”, “\1”, awards) What does this code chunk return? awards is already defined in the workspace so you can start playing in the console straight away.

A vector of character strings containing “Won 1 Oscar.”, “24”, “2”, “3”, “2”, “1”. press 4 HINT Try to see how the pattern responds to every string in the awards vector. If you are out of options, simply copy the sub() expression to the console and inspect the output.

Can you explain why all of this happened? The ([0-9]+) selects the entire number that comes before the word “nomination” in the string, and the entire match gets replaced by this number because of the \1 that reference to the content inside the parentheses. The next video will get you up to speed with times and dates in R!

Right here, right now

In R, dates are represented by Date objects, while times are represented by POSIXct objects. Under the hood, however, these dates and times are simple numerical values. Date objects store the number of days since the 1st of January in 1970. POSIXct objects on the other hand, store the number of seconds since the 1st of January in 1970.

The 1st of January in 1970 is the common origin for representing times and dates in a wide range of programming languages. There is no particular reason for this; it is a simple convention. Of course, it’s also possible to create dates and times before 1970; the corresponding numerical values are simply negative in this case.

INSTRUCTIONS

Ask R for the current date, and store the result in a variable today. To see what today looks like under the hood, call unclass() on it. Ask R for the current time, and store the result in a variable, now. To see the numerical value that corresponds to now, call unclass() on it.

HINT You can use Sys.Date() and Sys.time() to get the current date and time, respectively. Beware of the difference in capitalization!
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEdldCB0aGUgY3VycmVudCBkYXRlOiB0b2RheVxuXG5cbiMgU2VlIHdoYXQgdG9kYXkgbG9va3MgbGlrZSB1bmRlciB0aGUgaG9vZFxuXG5cbiMgR2V0IHRoZSBjdXJyZW50IHRpbWU6IG5vd1xuXG5cbiMgU2VlIHdoYXQgbm93IGxvb2tzIGxpa2UgdW5kZXIgdGhlIGhvb2QiLCJzb2x1dGlvbiI6IiMgR2V0IHRoZSBjdXJyZW50IGRhdGU6IHRvZGF5XG50b2RheSA8LSBTeXMuRGF0ZSgpXG5cbiMgU2VlIHdoYXQgdG9kYXkgbG9va3MgbGlrZSB1bmRlciB0aGUgaG9vZFxudW5jbGFzcyh0b2RheSlcblxuIyBHZXQgdGhlIGN1cnJlbnQgdGltZTogbm93XG5ub3cgPC0gU3lzLnRpbWUoKVxuXG4jIFNlZSB3aGF0IG5vdyBsb29rcyBsaWtlIHVuZGVyIHRoZSBob29kXG51bmNsYXNzKG5vdykifQ==

Using R to get the current date and time is nice, but you should also know how to create dates and times from character strings. Find out how in the next exercises!

Create and format dates To create a Date object from a simple character string in R, you can use the as.Date() function. The character string has to obey a format that can be defined using a set of symbols (the examples correspond to 13 January, 1982):

%Y: 4-digit year (1982) %y: 2-digit year (82) %m: 2-digit month (01) %d: 2-digit day of the month (13) %A: weekday (Wednesday) %a: abbreviated weekday (Wed) %B: month (January) %b: abbreviated month (Jan) The following R commands will all create the same Date object for the 13th day in January of 1982:

as.Date(“1982-01-13”) as.Date(“Jan-13-82”, format = “%b-%d-%y”) as.Date(“13 January, 1982”, format = “%d %B, %Y”) Notice that the first line here did not need a format argument, because by default R matches your character string to the formats “%Y-%m-%d” or “%Y/%m/%d”.

In addition to creating dates, you can also convert dates to character strings that use a different date notation. For this, you use the format() function. Try the following lines of code:

today <- Sys.Date() format(Sys.Date(), format = “%d %B, %Y”) format(Sys.Date(), format = “Today is a %A!”)

INSTRUCTIONS

In the editor on the right, three character strings representing dates have been created. Convert them to dates using as.Date(), and assign them to date1, date2, and date3 respectively. The code for date1 is already included. Extract useful information from the dates as character strings using format(). From the first date, select the weekday. From the second date, select the day of the month. From the third date, you should select the abbreviated month and the 4-digit year, separated by a space.

HINT For date2, you don’t have to specify the format argument, as the string has a standard form. To convert the third string to a date, use the “%d/%B/%Y” format. To convert the third date to the correct character string, use “%b %Y”
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluaXRpb24gb2YgY2hhcmFjdGVyIHN0cmluZ3MgcmVwcmVzZW50aW5nIGRhdGVzXG5zdHIxIDwtIFwiTWF5IDIzLCAnOTZcIlxuc3RyMiA8LSBcIjIwMTItMDMtMTVcIlxuc3RyMyA8LSBcIjMwL0phbnVhcnkvMjAwNlwiXG5cbiMgQ29udmVydCB0aGUgc3RyaW5ncyB0byBkYXRlczogZGF0ZTEsIGRhdGUyLCBkYXRlM1xuZGF0ZTEgPC0gYXMuRGF0ZShzdHIxLCBmb3JtYXQgPSBcIiViICVkLCAnJXlcIilcblxuXG5cbiMgQ29udmVydCBkYXRlcyB0byBmb3JtYXR0ZWQgc3RyaW5nc1xuZm9ybWF0KGRhdGUxLCBcIiVBXCIpIiwic29sdXRpb24iOiIjIERlZmluaXRpb24gb2YgY2hhcmFjdGVyIHN0cmluZ3MgcmVwcmVzZW50aW5nIGRhdGVzXG5zdHIxIDwtIFwiTWF5IDIzLCAnOTZcIlxuc3RyMiA8LSBcIjIwMTItMDMtMTVcIlxuc3RyMyA8LSBcIjMwL0phbnVhcnkvMjAwNlwiXG5cbiMgQ29udmVydCB0aGUgc3RyaW5ncyB0byBkYXRlczogZGF0ZTEsIGRhdGUyLCBkYXRlM1xuZGF0ZTEgPC0gYXMuRGF0ZShzdHIxLCBmb3JtYXQgPSBcIiViICVkLCAnJXlcIilcbmRhdGUyIDwtIGFzLkRhdGUoc3RyMilcbmRhdGUzIDwtIGFzLkRhdGUoc3RyMywgZm9ybWF0ID0gXCIlZC8lQi8lWVwiKSBcblxuXG4jIENvbnZlcnQgZGF0ZXMgdG8gZm9ybWF0dGVkIHN0cmluZ3NcbmZvcm1hdChkYXRlMSwgXCIlQVwiKSJ9

You’re a date magician! You can use POSIXct objects, i.e. Time objects in R, in a similar fashion. Give it a try in the next exercise.

Create and format times

Similar to working with dates, you can use as.POSIXct() to convert from a character string to a POSIXct object, and format() to convert from a POSIXct object to a character string. Again, you have a wide variety of symbols:

%H: hours as a decimal number (00-23) %I: hours as a decimal number (01-12) %M: minutes as a decimal number %S: seconds as a decimal number %T: shorthand notation for the typical format %H:%M:%S %p: AM/PM indicator For a full list of conversion symbols, consult the strptime documentation in the console:

?strptime Again,as.POSIXct() uses a default format to match character strings. In this case, it’s %Y-%m-%d %H:%M:%S. In this exercise, abstraction is made of different time zones.

INSTRUCTIONS:

Convert two strings that represent timestamps, str1 and str2, to POSIXct objects called time1 and time2. Using format(), create a string from time1 containing only the minutes. From time2, extract the hours and minutes as “hours:minutes AM/PM”. Refer to the assignment text above to find the correct conversion symbols!

HINT You don’t need to specify the format to build time2, as str2 comes in a standard format. To convert time1 to the correct character string, you can use the format “%M”. To convert time2 to the correct character string, you can use the format “%I:%M %p”.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluaXRpb24gb2YgY2hhcmFjdGVyIHN0cmluZ3MgcmVwcmVzZW50aW5nIHRpbWVzXG5zdHIxIDwtIFwiTWF5IDIzLCAnOTYgaG91cnM6MjMgbWludXRlczowMSBzZWNvbmRzOjQ1XCJcbnN0cjIgPC0gXCIyMDEyLTMtMTIgMTQ6MjM6MDhcIlxuXG4jIENvbnZlcnQgdGhlIHN0cmluZ3MgdG8gUE9TSVhjdCBvYmplY3RzOiB0aW1lMSwgdGltZTJcbnRpbWUxIDwtIGFzLlBPU0lYY3Qoc3RyMSwgZm9ybWF0ID0gXCIlQiAlZCwgJyV5IGhvdXJzOiVIIG1pbnV0ZXM6JU0gc2Vjb25kczolU1wiKVxuXG5cbiMgQ29udmVydCB0aW1lcyB0byBmb3JtYXR0ZWQgc3RyaW5ncyIsInNvbHV0aW9uIjoiIyBEZWZpbml0aW9uIG9mIGNoYXJhY3RlciBzdHJpbmdzIHJlcHJlc2VudGluZyB0aW1lc1xuc3RyMSA8LSBcIk1heSAyMywgJzk2IGhvdXJzOjIzIG1pbnV0ZXM6MDEgc2Vjb25kczo0NVwiXG5zdHIyIDwtIFwiMjAxMi0zLTEyIDE0OjIzOjA4XCJcblxuIyBDb252ZXJ0IHRoZSBzdHJpbmdzIHRvIFBPU0lYY3Qgb2JqZWN0czogdGltZTEsIHRpbWUyXG50aW1lMSA8LSBhcy5QT1NJWGN0KHN0cjEsIGZvcm1hdCA9IFwiJUIgJWQsICcleSBob3VyczolSCBtaW51dGVzOiVNIHNlY29uZHM6JVNcIilcbnRpbWUyIDwtIGFzLlBPU0lYY3Qoc3RyMilcblxuIyBDb252ZXJ0IHRpbWVzIHRvIGZvcm1hdHRlZCBzdHJpbmdzXG5mb3JtYXQodGltZTEsIFwiJU1cIilcbmZvcm1hdCh0aW1lMiwgXCIlSTolTSAlcFwiKSJ9

Calculations with Dates

Both Date and POSIXct R objects are represented by simple numerical values under the hood. This makes calculation with time and date objects very straightforward: R performs the calculations using the underlying numerical values, and then converts the result back to human-readable time information again.

You can increment and decrement Date objects, or do actual calculations with them (try it out in the console!):

today <- Sys.Date() today + 1 today - 1

as.Date(“2015-03-12”) - as.Date(“2015-02-27”) To control your eating habits, you decided to write down the dates of the last five days that you ate pizza. In the workspace, these dates are defined as five Date objects, day1 to day5. The code on the right also contains a vector pizza with these 5 Date objects.

INSTRUCTIONS

Calculate the number of days that passed between the last and the first day you ate pizza. Print the result. Use the function diff() on pizza to calculate the differences between consecutive pizza days. Store the result in a new variable day_diff. Calculate the average period between two consecutive pizza days. Print the result.

HINT For more information on the diff() function, execute ?diff in the console. Basically, this functions shifts its input vector by 1 index and then calculates the difference between this shifted vector and the original vector. In this way, it computes the difference between consecutive vector elements.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IiBkYXkxPC1hcy5EYXRlKFwiMjAxOC0wOS0wOFwiKVxuIGRheTI8LWFzLkRhdGUoXCIyMDE4LTA5LTEwXCIpXG4gZGF5MzwtYXMuRGF0ZShcIjIwMTgtMDktMTVcIilcbiBkYXk0PC1hcy5EYXRlKFwiMjAxOC0wOS0yMVwiKVxuIGRheTU8LWFzLkRhdGUoXCIyMDE4LTA5LTI2XCIpIiwic2FtcGxlIjoiIyBkYXkxLCBkYXkyLCBkYXkzLCBkYXk0IGFuZCBkYXk1IGFyZSBhbHJlYWR5IGF2YWlsYWJsZSBpbiB0aGUgd29ya3NwYWNlXG5cbiMgRGlmZmVyZW5jZSBiZXR3ZWVuIGxhc3QgYW5kIGZpcnN0IHBpenphIGRheVxuXG5cbiMgQ3JlYXRlIHZlY3RvciBwaXp6YVxucGl6emEgPC0gYyhkYXkxLCBkYXkyLCBkYXkzLCBkYXk0LCBkYXk1KVxuXG4jIENyZWF0ZSBkaWZmZXJlbmNlcyBiZXR3ZWVuIGNvbnNlY3V0aXZlIHBpenphIGRheXM6IGRheV9kaWZmXG5cblxuIyBBdmVyYWdlIHBlcmlvZCBiZXR3ZWVuIHR3byBjb25zZWN1dGl2ZSBwaXp6YSBkYXlzIiwic29sdXRpb24iOiIjIGRheTEsIGRheTIsIGRheTMsIGRheTQgYW5kIGRheTUgYXJlIGFscmVhZHkgYXZhaWxhYmxlIGluIHRoZSB3b3Jrc3BhY2VcblxuIyBEaWZmZXJlbmNlIGJldHdlZW4gbGFzdCBhbmQgZmlyc3QgcGl6emEgZGF5XG5kYXk1IC0gZGF5MVxuXG4jIENyZWF0ZSB2ZWN0b3IgcGl6emFcbnBpenphIDwtIGMoZGF5MSwgZGF5MiwgZGF5MywgZGF5NCwgZGF5NSlcblxuIyBDcmVhdGUgZGlmZmVyZW5jZXMgYmV0d2VlbiBjb25zZWN1dGl2ZSBwaXp6YSBkYXlzOiBkYXlfZGlmZlxuZGF5X2RpZmYgPC0gZGlmZihwaXp6YSlcblxuIyBBdmVyYWdlIHBlcmlvZCBiZXR3ZWVuIHR3byBjb25zZWN1dGl2ZSBwaXp6YSBkYXlzXG5tZWFuKGRheV9kaWZmKSJ9

Calculations with Times

Calculations using POSIXct objects are completely analogous to those using Date objects. Try to experiment with this code to increase or decrease POSIXct objects:

now <- Sys.time() now + 3600 # add an hour now - 3600 * 24 # subtract a day Adding or substracting time objects is also straightforward:

birth <- as.POSIXct(“1879-03-14 14:37:23”) death <- as.POSIXct(“1955-04-18 03:47:12”) einstein <- death - birth einstein You’re developing a website that requires users to log in and out. You want to know what is the total and average amount of time a particular user spends on your website. This user has logged in 5 times and logged out 5 times as well. These times are gathered in the vectors login and logout, which are already defined in the workspace.

INSTRUCTIONS

Calculate the difference between the two vectors logout and login, i.e. the time the user was online in each independent session. Store the result in a variable time_online. Inspect the variable time_online by printing it. Calculate the total time that the user was online. Print the result. Calculate the average time the user was online. Print the result. Show Answer (-70 XP) HINT For the first instruction, simply type logout - login. For the third and fourth instruction, you can use sum() and mean() out of the box.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxvZ2luPC0gYXMuUE9TSVhjdChjKFwiMjAxOC0wOS0xMiAxMDoxODowNCBVVENcIiwgXCIyMDE4LTA5LTE3IDA5OjE0OjE4IFVUQ1wiLCBcIjIwMTgtMDktMTcgMTI6MjE6NTEgVVRDXCIsIFwiMjAxOC0wOS0xNyAxMjozNzoyNCBVVENcIiwgXCIyMDE4LTA5LTE5IDIxOjM3OjU1IFVUQ1wiKSlcblxubG9nb3V0PC1hcy5QT1NJWGN0KGMoXCIyMDE4LTA5LTEyIDEwOjU2OjI5IFVUQ1wiLCAgXCIyMDE4LTA5LTE3IDA5OjE0OjUyIFVUQ1wiLCBcIjIwMTgtMDktMTcgMTI6MzU6NDggVVRDXCIsIFwiMjAxOC0wOS0xNyAxMzoxNzoyMiBVVENcIiwgXCIyMDE4LTA5LTE5IDIyOjA4OjQ3IFVUQ1wiKSkiLCJzYW1wbGUiOiIjIGxvZ2luIGFuZCBsb2dvdXQgYXJlIGFscmVhZHkgZGVmaW5lZCBpbiB0aGUgd29ya3NwYWNlXG4jIENhbGN1bGF0ZSB0aGUgZGlmZmVyZW5jZSBiZXR3ZWVuIGxvZ2luIGFuZCBsb2dvdXQ6IHRpbWVfb25saW5lXG5cblxuIyBJbnNwZWN0IHRoZSB2YXJpYWJsZSB0aW1lX29ubGluZVxuXG5cbiMgQ2FsY3VsYXRlIHRoZSB0b3RhbCB0aW1lIG9ubGluZVxuXG5cbiMgQ2FsY3VsYXRlIHRoZSBhdmVyYWdlIHRpbWUgb25saW5lIiwic29sdXRpb24iOiIjIGxvZ2luIGFuZCBsb2dvdXQgYXJlIGFscmVhZHkgZGVmaW5lZCBpbiB0aGUgd29ya3NwYWNlXG4jIENhbGN1bGF0ZSB0aGUgZGlmZmVyZW5jZSBiZXR3ZWVuIGxvZ2luIGFuZCBsb2dvdXQ6IHRpbWVfb25saW5lXG50aW1lX29ubGluZSA8LSBsb2dvdXQgLSBsb2dpblxuXG4jIEluc3BlY3QgdGhlIHZhcmlhYmxlIHRpbWVfb25saW5lXG50aW1lX29ubGluZVxuXG4jIENhbGN1bGF0ZSB0aGUgdG90YWwgdGltZSBvbmxpbmVcbnN1bSh0aW1lX29ubGluZSlcblxuIyBDYWxjdWxhdGUgdGhlIGF2ZXJhZ2UgdGltZSBvbmxpbmVcbm1lYW4odGltZV9vbmxpbmUpIn0=

Time is of the essence

The dates when a season begins and ends can vary depending on who you ask. People in Australia will tell you that spring starts on September 1st. The Irish people in the Northern hemisphere will swear that spring starts on February 1st, with the celebration of St. Brigid’s Day. Then there’s also the difference between astronomical and meteorological seasons: while astronomers are used to equinoxes and solstices, meteorologists divide the year into 4 fixed seasons that are each three months long. (source: www.timeanddate.com)

A vector astro, which contains character strings representing the dates on which the 4 astronomical seasons start, has been defined on your workspace. Similarly, a vector meteo has already been created for you, with the meteorological beginnings of a season.

INSTRUCTIONS

Use as.Date() to convert the astro vector to a vector containing Date objects. You will need the %d, %b and %Y symbols to specify the format. Store the resulting vector as astro_dates. Use as.Date() to convert the meteo vector to a vector with Date objects. This time, you will need the %B, %d and %y symbols for the format argument. Store the resulting vector as meteo_dates. With a combination of max(), abs() and -, calculate the maximum absolute difference between the astronomical and the meteorological beginnings of a season, i.e. astro_dates and meteo_dates. Simply print this maximum difference to the console output.

HINT To convert astro to a vector of Date objects, you’ll need the following format: “%d-%b-%Y”. To convert meteo to a vector of Date objects, you’ll want to use “%B %d, %y” in the format argument. For the final instruction, use max(abs(x - y)), where you set x and y appropriately

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImFzdHJvPC1jYmluZChcIjIwLU1hci0yMDE1XCIsIFwiMjUtSnVuLTIwMTVcIiwgXCIyMy1TZXAtMjAxNVwiLCBcIjIyLURlYy0yMDE1XCIpXG5jb2xuYW1lcyhhc3Rybyk8LWMoXCJzcHJpbmdcIixcInN1bW1lclwiLCBcImZhbGxcIiwgXCJ3aW50ZXJcIilcbm1ldGVvPC1jYmluZCggXCJNYXJjaCAxLCAxNVwiICwgIFwiSnVuZSAxLCAxNVwiLCAgXCJTZXB0ZW1iZXIgMSwgMTVcIiwgIFwiRGVjZW1iZXIgMSwgMTVcIilcbmNvbG5hbWVzKG1ldGVvKTwtYyhcInNwcmluZ1wiLFwic3VtbWVyXCIsIFwiZmFsbFwiLCBcIndpbnRlclwiKSIsInNhbXBsZSI6IiMgQ29udmVydCBhc3RybyB0byB2ZWN0b3Igb2YgRGF0ZSBvYmplY3RzOiBhc3Ryb19kYXRlc1xuXG5cbiMgQ29udmVydCBtZXRlbyB0byB2ZWN0b3Igb2YgRGF0ZSBvYmplY3RzOiBtZXRlb19kYXRlc1xuXG5cbiMgQ2FsY3VsYXRlIHRoZSBtYXhpbXVtIGFic29sdXRlIGRpZmZlcmVuY2UgYmV0d2VlbiBhc3Ryb19kYXRlcyBhbmQgbWV0ZW9fZGF0ZXMiLCJzb2x1dGlvbiI6IiMgQ29udmVydCBhc3RybyB0byB2ZWN0b3Igb2YgRGF0ZSBvYmplY3RzOiBhc3Ryb19kYXRlc1xuYXN0cm9fZGF0ZXMgPC0gYXMuRGF0ZShhc3RybywgZm9ybWF0ID0gXCIlZC0lYi0lWVwiKVxuXG4jIENvbnZlcnQgbWV0ZW8gdG8gdmVjdG9yIG9mIERhdGUgb2JqZWN0czogbWV0ZW9fZGF0ZXNcbm1ldGVvX2RhdGVzIDwtIGFzLkRhdGUobWV0ZW8sIGZvcm1hdCA9IFwiJUIgJWQsICV5XCIpXG5cbiMgQ2FsY3VsYXRlIHRoZSBtYXhpbXVtIGFic29sdXRlIGRpZmZlcmVuY2UgYmV0d2VlbiBhc3Ryb19kYXRlcyBhbmQgbWV0ZW9fZGF0ZXNcbm1heChhYnMoYXN0cm9fZGF0ZXMgLSBtZXRlb19kYXRlcykpIn0=