子曾经曰过,“九层之台起于累土;千里之行始于足下”。从这里,你要开始书写你的第一行R代码;你将要学习如何把R console当作计算器来使用以及来给变量赋值;你也将了解R的数据类型。 我们开始吧!

Basics

熟悉环境:如何使用我们的在线交互R语言练习平台?

  • 你将在下边左边的文本框输入代码来做练习。
  • 首先你输入代码,然后点击“run”来提交答案。
  • 在下面右边的输出框(工作台),你会看到所输入代码的结果。
  • 在R语言里,我们使用#来添加注解,这样R就不会把带有#的一行当作代码来运行。
  • 你可以直接在右边的工作台输入代码,点击回车键来运行,这非常合适简单代码的练习。
示例:
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENhbGN1bGF0ZSAzICsgNVxuMyArIDVcblxuIyBDYWxjdWxhdGUgOCArIDkiLCJzb2x1dGlvbiI6IiMgQ2FsY3VsYXRlIDMgKyA1XG4zICsgNVxuXG4jIENhbGN1bGF0ZSA4ICsgOVxuOCArIDkifQ==

R的数学运算:

可以当作最基本的计算器来使用。

  • -加: +
  • 减: -
  • 乘: *
  • 除: /
  • 乘方: ^
  • 取余数: %%
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFuIGFkZGl0aW9uXG41ICsgNSBcblxuIyBBIHN1YnRyYWN0aW9uXG41IC0gNSBcblxuIyBBIG11bHRpcGxpY2F0aW9uXG4zICogNVxuXG4gIyBBIGRpdmlzaW9uXG4oNSArIDUpIC8gMiBcblxuIyBFeHBvbmVudGlhdGlvblxuMl41XG5cblxuIyBNb2R1bG9cbjI4JSU2In0=

赋值给变量

变量是R里面的一个基本概念。

使用R时,你可以用变量来存储一个值 (e.g. 4) 或者一个对象 (e.g. 函数) in R。之后你就可以使用变量的名称来使用这个值或者对象。简单说来,就是给一些东西取个名字,这样以后方便叫它。

试着用下面这行代码,来把4这个数存储(赋值)给变量 my_var, 然后你只需要输入my_var,R Console就将4输出来:
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJteV92YXIgPC0gNFxubXlfdmFyIn0=

示例:

INSTRUCTIONS:

轮到你了: 在编辑区创建一个变量x,并将42这个数指派给它,然后点击“run”。接下来你可以直接在R console 输入x, R console 就会给你输出42。

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFzc2lnbiB0aGUgdmFsdWUgNDIgdG8geFxueCA8LSA0MlxuXG4jIFByaW50IG91dCB0aGUgdmFsdWUgb2YgdGhlIHZhcmlhYmxlIHhcbnggIn0=

示例:

赋值给变量 (2)

Suppose you have a fruit basket with five apples. As a data analyst in training, you want to store the number of apples in a variable with the name my_apples.

INSTRUCTIONS

  • Type the following code in the editor: my_apples <- 5. This will assign the value 5 to my_apples.
  • Type: my_apples below the second comment. This will print out the value of my_apples.
  • Click ‘Submit Answer’, and look at the console: you see that the number 5 is printed. So R now links the variable my_apples to the value 5.
示例:
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFzc2lnbiB0aGUgdmFsdWUgNSB0byB0aGUgdmFyaWFibGUgbXlfYXBwbGVzXG5teV9hcHBsZXMgPC0gNVxuXG4jIFByaW50IG91dCB0aGUgdmFsdWUgb2YgdGhlIHZhcmlhYmxlIG15X2FwcGxlc1xubXlfYXBwbGVzIn0=

示例:

赋值给变量 (3)

Every tasty fruit basket needs oranges, so you decide to add six oranges. As a data analyst, your reflex is to immediately create the variable my_oranges and assign the value 6 to it. Next, you want to calculate how many pieces of fruit you have in total. Since you have given meaningful names to these values, you can now code this in a clear way: my_apples + my_oranges

INSTRUCTIONS

  • Assign to my_oranges the value 6.
  • Add the variables my_apples and my_oranges and have R simply print the result.
  • Assign the result of adding my_apples and my_oranges to a new variable my_fruit.

示例:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFzc2lnbiBhIHZhbHVlIHRvIHRoZSB2YXJpYWJsZXMgbXlfYXBwbGVzIGFuZCBteV9vcmFuZ2VzXG5teV9hcHBsZXMgIDwtIDVcbm15X29yYW5nZXMgPC0gNlxuXG4jIEFkZCB0aGVzZSB0d28gdmFyaWFibGVzIHRvZ2V0aGVyXG5teV9hcHBsZXMgKyBteV9vcmFuZ2VzXG5cbiMgQ3JlYXRlIHRoZSB2YXJpYWJsZSBteV9mcnVpdFxubXlfZnJ1aXQgPC0gbXlfYXBwbGVzICsgbXlfb3JhbmdlcyJ9

The great advantage of doing calculations with variables is reusability. If you just change my_apples to equal 12 instead of 5 and rerun the script, my_fruit will automatically update as well. Continue to the next exercise.

示例:

苹果和橙子

Common knowledge tells you not to add apples and oranges. But hey, that is what you just did, no :-)? The my_apples and my_oranges variables both contained a number in the previous exercise. The + operator works with numeric variables in R. If you really tried to add “apples” and “oranges”, and assigned a text value to the variable my_oranges (see the editor), you would be trying to assign the addition of a numeric and a character variable to the variable my_fruit. This is not possible.

INSTRUCTIONS

  • Click ‘Submit Answer’ and read the error message. Make sure to understand why this did not work.
  • Adjust the code so that R knows you have 6 oranges and thus a fruit basket with 11 pieces of fruit.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFzc2lnbiBhIHZhbHVlIHRvIHRoZSB2YXJpYWJsZSBteV9hcHBsZXNcbm15X2FwcGxlcyA8LSA1IFxuXG4jIEZpeCB0aGUgYXNzaWdubWVudCBvZiBteV9vcmFuZ2VzXG5teV9vcmFuZ2VzIDwtIFwic2l4XCIgXG5cbiMgQ3JlYXRlIHRoZSB2YXJpYWJsZSBteV9mcnVpdCBhbmQgcHJpbnQgaXQgb3V0XG5teV9mcnVpdCA8LSBteV9hcHBsZXMgKyBteV9vcmFuZ2VzIFxubXlfZnJ1aXQiLCJzb2x1dGlvbiI6IiMgQXNzaWduIGEgdmFsdWUgdG8gdGhlIHZhcmlhYmxlIG15X2FwcGxlc1xubXlfYXBwbGVzIDwtIDUgIFxuXG4jIEZpeCB0aGUgYXNzaWdubWVudCBvZiBteV9vcmFuZ2VzXG5teV9vcmFuZ2VzIDwtIDZcblxuIyBDcmVhdGUgdGhlIHZhcmlhYmxlIG15X2ZydWl0IGFuZCBwcmludCBpdCBvdXRcbm15X2ZydWl0IDwtIG15X2FwcGxlcyArIG15X29yYW5nZXMgXG5teV9mcnVpdCJ9

示例:

示例:

R 的基本数据类型

R works with numerous data types. Some of the most basic types to get started are:

  • Decimal values like 4.5 are called numerics.
  • Natural numbers like 4 are called integers. Integers are also numerics.
  • Boolean values (TRUE or FALSE) are called logical.
  • Text (or string) values are called characters.
  • Note how the quotation marks on the right indicate that “some text” is a character.

示例:

INSTRUCTIONS

Change the value of the:

  • my_numeric variable to 42.
  • my_character variable to “universe”. (Note that the quotation marks indicate that “universe” is a character.)
  • my_logical variable to FALSE. (Note that R is case sensitive!)
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENoYW5nZSBteV9udW1lcmljIHRvIGJlIDQyXG5teV9udW1lcmljIDwtIDQyXG5cbiMgQ2hhbmdlIG15X2NoYXJhY3RlciB0byBiZSBcInVuaXZlcnNlXCJcbm15X2NoYXJhY3RlciA8LSBcInVuaXZlcnNlXCJcblxuIyBDaGFuZ2UgbXlfbG9naWNhbCB0byBiZSBGQUxTRVxubXlfbG9naWNhbCA8LSBGQUxTRSJ9

示例:

如何判断数据类型?

Do you remember that when you added 5 + “six”, you got an error due to a mismatch in data types? You can avoid such embarrassing situations by checking the data type of a variable beforehand. You can do this with the class() function, as the code on the right shows.

INSTRUCTIONS

Complete the code in the editor and also print out the classes of my_character and my_logical.

示例:
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlY2xhcmUgdmFyaWFibGVzIG9mIGRpZmZlcmVudCB0eXBlczpcbm15X251bWVyaWMgPC0gNDJcbm15X2NoYXJhY3RlciA8LSBcInVuaXZlcnNlXCJcbm15X2xvZ2ljYWwgPC0gRkFMU0VcblxuIyBDaGVjayBjbGFzcyBvZiBteV9udW1lcmljXG5jbGFzcyhteV9udW1lcmljKVxuXG4jIENoZWNrIGNsYXNzIG9mIG15X2NoYXJhY3RlclxuY2xhc3MobXlfY2hhcmFjdGVyKVxuXG4jIENoZWNrIGNsYXNzIG9mIG15X2xvZ2ljYWxcbmNsYXNzKG15X2xvZ2ljYWwpIn0=

Vectors

向量Vectors

In this free R course, we’ll take you on a trip to Vegas, where you will learn how to analyze your gambling results using vectors in R! After completing this chapter, you will be able to create vectors in R, name them, select elements from them and compare different vectors. Create a vector Feeling lucky? You better, because this chapter takes you on a trip to the City of Sins, also known as Statisticians Paradise!

Thanks to R and your new data-analytical skills, you will learn how to uplift your performance at the tables and fire off your career as a professional gambler. This chapter will show how you can easily keep track of your betting progress and how you can do some simple analyses on past actions. Next stop, Vegas Baby… VEGAS!!

INSTRUCTIONS

Do you still remember what you have learned in the first chapter? Assign the value “Go!” to the variable vegas. Remember: R is case sensitive!

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluZSB0aGUgdmFyaWFibGUgdmVnYXNcbnZlZ2FzIDwtIFwiR28hXCIifQ==

创建一个向量 Create a vector (2)

Let us focus first!

On your way from rags to riches, you will make extensive use of vectors. Vectors are one-dimension arrays that can hold numeric data, character data, or logical data. In other words, a vector is a simple tool to store data. For example, you can store your daily gains and losses in the casinos.

In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:

numeric_vector <- c(1, 2, 3) character_vector <- c(“a”, “b”, “c”) Once you have created these vectors in R, you can use them to do calculations.

INSTRUCTIONS

100 XP Complete the code such that boolean_vector contains the three elements: TRUE, FALSE and TRUE (in that order).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJudW1lcmljX3ZlY3RvciA8LSBjKDEsIDEwLCA0OSlcbmNoYXJhY3Rlcl92ZWN0b3IgPC0gYyhcImFcIiwgXCJiXCIsIFwiY1wiKVxuXG4jIENvbXBsZXRlIHRoZSBjb2RlIGZvciBib29sZWFuX3ZlY3RvclxuYm9vbGVhbl92ZWN0b3IgPC0gYyhUUlVFLCBGQUxTRSwgVFJVRSkifQ==

Notice that adding a space behind the commas in the c() function improves the readability of your code. Let’s practice some more with vector creation in the next exercise.

Create a vector (3)

After one week in Las Vegas and still zero Ferraris in your garage, you decide that it is time to start using your data analytical superpowers.

Before doing a first analysis, you decide to first collect all the winnings and losses for the last week:

For poker_vector:

On Monday you won $140 Tuesday you lost $50 Wednesday you won $20 Thursday you lost $120 Friday you won $240 For roulette_vector:

On Monday you lost $24 Tuesday you lost $50 Wednesday you won $100 Thursday you lost $350 Friday you won $10 You only played poker and roulette, since there was a delegation of mediums that occupied the craps tables. To be able to use this data in R, you decide to create the variables poker_vector and roulette_vector.

INSTRUCTIONS

100 XP ####INSTRUCTIONS 100 XP Assign the winnings/losses for roulette to the variable roulette_vector

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIHdpbm5pbmdzIGZyb20gTW9uZGF5IHRvIEZyaWRheVxucG9rZXJfdmVjdG9yIDwtIGMoMTQwLCAtNTAsIDIwLCAtMTIwLCAyNDApXG5cbiMgUm91bGV0dGUgd2lubmluZ3MgZnJvbSBNb25kYXkgdG8gRnJpZGF5XG5yb3VsZXR0ZV92ZWN0b3IgPC0gYygtMjQsIC01MCwgMTAwLCAtMzUwLCAxMCkifQ==

Naming a vector

As a data analyst, it is important to have a clear view on the data that you are using. Understanding what each element refers to is therefore essential.

In the previous exercise, we created a vector with your winnings over the week. Each vector element refers to a day of the week but it is hard to tell which element belongs to which day. It would be nice if you could show that in the vector itself.

You can give a name to the elements of a vector with the names() function. Have a look at this example:

some_vector <- c(“John Doe”, “poker player”) names(some_vector) <- c(“Name”, “Profession”) This code first creates a vector some_vector and then gives the two elements a name. The first element is assigned the name Name, while the second element is labeled Profession.

The code on the right names the elements in poker_vector with the days of the week. Add code to do the same thing for roulette_vector.

You can use names(roulette_vector) to set the names of the variable roulette_vector. Make sure to use the same vector with the days of the week as names. Remember that R is case sensitive!

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIHdpbm5pbmdzIGZyb20gTW9uZGF5IHRvIEZyaWRheVxucG9rZXJfdmVjdG9yIDwtIGMoMTQwLCAtNTAsIDIwLCAtMTIwLCAyNDApXG5cbiMgUm91bGV0dGUgd2lubmluZ3MgZnJvbSBNb25kYXkgdG8gRnJpZGF5XG5yb3VsZXR0ZV92ZWN0b3IgPC0gYygtMjQsIC01MCwgMTAwLCAtMzUwLCAxMClcblxuIyBBc3NpZ24gZGF5cyBhcyBuYW1lcyBvZiBwb2tlcl92ZWN0b3Jcbm5hbWVzKHBva2VyX3ZlY3RvcikgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxuXG4jIEFzc2lnbiBkYXlzIGFzIG5hbWVzIG9mIHJvdWxldHRlX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBjKFwiTW9uZGF5XCIsIFwiVHVlc2RheVwiLCBcIldlZG5lc2RheVwiLCBcIlRodXJzZGF5XCIsIFwiRnJpZGF5XCIpIn0=

Naming a vector (2)

If you want to become a good statistician, you have to become lazy. (If you are already lazy, chances are high you are one of those exceptional, natural-born statistical talents.)

In the previous exercises you probably experienced that it is boring and frustrating to type and retype information such as the days of the week. However, when you look at it from a higher perspective, there is a more efficient way to do this, namely, to assign the days of the week vector to a variable!

Just like you did with your poker and roulette returns, you can also create a variable that contains the days of the week. This way you can use and re-use it.

INSTRUCTIONS

70 XP A variable days_vector that contains the days of the week has already been created for you. Use days_vector to set the names of poker_vector and roulette_vector. Show Answer (-70 XP) ####HINT You can use names(poker_vector) <- days_vector to set the names of the elements poker_vector. Do a similar thing for roulette_vector.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIHdpbm5pbmdzIGZyb20gTW9uZGF5IHRvIEZyaWRheVxucG9rZXJfdmVjdG9yIDwtIGMoMTQwLCAtNTAsIDIwLCAtMTIwLCAyNDApXG5cbiMgUm91bGV0dGUgd2lubmluZ3MgZnJvbSBNb25kYXkgdG8gRnJpZGF5XG5yb3VsZXR0ZV92ZWN0b3IgPC0gYygtMjQsIC01MCwgMTAwLCAtMzUwLCAxMClcblxuIyBUaGUgdmFyaWFibGUgZGF5c192ZWN0b3JcbmRheXNfdmVjdG9yIDwtIGMoXCJNb25kYXlcIiwgXCJUdWVzZGF5XCIsIFwiV2VkbmVzZGF5XCIsIFwiVGh1cnNkYXlcIiwgXCJGcmlkYXlcIilcblxuIyBBc3NpZ24gdGhlIG5hbWVzIG9mIHRoZSBkYXkgdG8gcm91bGV0dGVfdmVjdG9yIGFuZCBwb2tlcl92ZWN0b3Jcbm5hbWVzKHBva2VyX3ZlY3RvcikgPC0gZGF5c192ZWN0b3Jcbm5hbWVzKHJvdWxldHRlX3ZlY3RvcikgPC0gZGF5c192ZWN0b3IifQ==

A word of advice: try to avoid code duplication at all times. Continue to the next exercise and learn how to do arithmetic with vectors!

Calculating total winnings

Now that you have the poker and roulette winnings nicely as named vectors, you can start doing some data analytical magic.

You want to find out the following type of information:

How much has been your overall profit or loss per day of the week? Have you lost money over the week in total? Are you winning/losing money on poker or on roulette? To get the answers, you have to do arithmetic calculations on vectors.

It is important to know that if you sum two vectors in R, it takes the element-wise sum. For example, the following three statements are completely equivalent:

c(1, 2, 3) + c(4, 5, 6) c(1 + 4, 2 + 5, 3 + 6) c(5, 7, 9) You can also do the calculations with variables that represent vectors:

a <- c(1, 2, 3) b <- c(4, 5, 6) c <- a + b ####INSTRUCTIONS 70 XP Take the sum of the variables A_vector and B_vector and assign it to total_vector. Inspect the result by printing out total_vector. Show Answer (-70 XP) ####HINT Use the + operator to sum A_vector and B_vector. Use <- to assign the result to total_vector.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJBX3ZlY3RvciA8LSBjKDEsIDIsIDMpXG5CX3ZlY3RvciA8LSBjKDQsIDUsIDYpXG5cbiMgVGFrZSB0aGUgc3VtIG9mIEFfdmVjdG9yIGFuZCBCX3ZlY3RvclxudG90YWxfdmVjdG9yIDwtIEFfdmVjdG9yICsgQl92ZWN0b3JcblxuIyBQcmludCBvdXQgdG90YWxfdmVjdG9yXG50b3RhbF92ZWN0b3IifQ==

Calculating total winnings (2)

Now you understand how R does arithmetic with vectors, it is time to get those Ferraris in your garage! First, you need to understand what the overall profit or loss per day of the week was. The total daily profit is the sum of the profit/loss you realized on poker per day, and the profit/loss you realized on roulette per day.

In R, this is just the sum of roulette_vector and poker_vector.

INSTRUCTIONS

100 XP Assign to the variable total_daily how much you won or lost on each day in total (poker and roulette combined).

HINT

Similar to the previous exercise, assign the sum of two vectors to a new variable, total_daily.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIEFzc2lnbiB0byB0b3RhbF9kYWlseSBob3cgbXVjaCB5b3Ugd29uL2xvc3Qgb24gZWFjaCBkYXlcbnRvdGFsX2RhaWx5IDwtIHBva2VyX3ZlY3RvciArIHJvdWxldHRlX3ZlY3RvciJ9

Calculating total winnings (3)

Based on the previous analysis, it looks like you had a mix of good and bad days. This is not what your ego expected, and you wonder if there may be a very tiny chance you have lost money over the week in total?

A function that helps you to answer this question is sum(). It calculates the sum of all elements of a vector. For example, to calculate the total amount of money you have lost/won with poker you do:

total_poker <- sum(poker_vector) ####INSTRUCTIONS 70 XP Calculate the total amount of money that you have won/lost with roulette and assign to the variable total_roulette. Now that you have the totals for roulette and poker, you can easily calculate total_week (which is the sum of all gains and losses of the week). Print out total_week. Show Answer (-70 XP) ####HINT Use the sum() function to get the total of the roulette_vector. total_week is then the sum of roulette_vector and poker_vector.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIFRvdGFsIHdpbm5pbmdzIHdpdGggcG9rZXJcbnRvdGFsX3Bva2VyIDwtIHN1bShwb2tlcl92ZWN0b3IpXG5cbiMgVG90YWwgd2lubmluZ3Mgd2l0aCByb3VsZXR0ZVxudG90YWxfcm91bGV0dGUgPC0gIHN1bShyb3VsZXR0ZV92ZWN0b3IpXG5cbiMgVG90YWwgd2lubmluZ3Mgb3ZlcmFsbFxudG90YWxfd2VlayA8LSB0b3RhbF9yb3VsZXR0ZSArIHRvdGFsX3Bva2VyXG5cbiMgUHJpbnQgb3V0IHRvdGFsX3dlZWtcbnRvdGFsX3dlZWsifQ==

Comparing total winnings

Oops, it seems like you are losing money. Time to rethink and adapt your strategy! This will require some deeper analysis…

After a short brainstorm in your hotel’s jacuzzi, you realize that a possible explanation might be that your skills in roulette are not as well developed as your skills in poker. So maybe your total gains in poker are higher (or > ) than in roulette.

INSTRUCTIONS

70 XP Calculate total_poker and total_roulette as in the previous exercise. Use the sum() function twice. Check if your total gains in poker are higher than for roulette by using a comparison. Simply print out the result of this comparison. What do you conclude, should you focus on roulette or on poker? Show Answer (-70 XP) ####HINT You partly calculated the answer to this question in the previous exercise already! To check if 6 is larger than 5, you type 6 > 5. This returns a logical value (TRUE or FALSE).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIENhbGN1bGF0ZSB0b3RhbCBnYWlucyBmb3IgcG9rZXIgYW5kIHJvdWxldHRlXG50b3RhbF9wb2tlciA8LSBzdW0ocG9rZXJfdmVjdG9yKVxudG90YWxfcm91bGV0dGUgPC0gc3VtKHJvdWxldHRlX3ZlY3RvcilcblxuIyBDaGVjayBpZiB5b3UgcmVhbGl6ZWQgaGlnaGVyIHRvdGFsIGdhaW5zIGluIHBva2VyIHRoYW4gaW4gcm91bGV0dGVcbnRvdGFsX3Bva2VyID4gdG90YWxfcm91bGV0dGUifQ==

Vector selection: the good times

Your hunch seemed to be right. It appears that the poker game is more your cup of tea than roulette.

Another possible route for investigation is your performance at the beginning of the working week compared to the end of it. You did have a couple of Margarita cocktails at the end of the week…

To answer that question, you only want to focus on a selection of the total_vector. In other words, our goal is to select specific elements of the vector. To select elements of a vector (and later matrices, data frames, …), you can use square brackets. Between the square brackets, you indicate what elements to select. For example, to select the first element of the vector, you type poker_vector[1]. To select the second element of the vector, you type poker_vector[2], etc. Notice that the first element in a vector has index 1, not 0 as in many other programming languages.

INSTRUCTIONS

70 XP Assign the poker results of Wednesday to the variable poker_wednesday.

Show Answer (-70 XP) ####HINT Wednesday is the third element of poker_vector, and can thus be selected with poker_vector[3].

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIERlZmluZSBhIG5ldyB2YXJpYWJsZSBiYXNlZCBvbiBhIHNlbGVjdGlvblxucG9rZXJfd2VkbmVzZGF5IDwtIHBva2VyX3ZlY3RvclszXSJ9

Vector selection: the good times (2)

How about analyzing your midweek results?

To select multiple elements from a vector, you can add square brackets at the end of it. You can indicate between the brackets what elements should be selected. For example: suppose you want to select the first and the fifth day of the week: use the vector c(1, 5) between the square brackets. For example, the code below selects the first and fifth element of poker_vector:

poker_vector[c(1, 5)] ####INSTRUCTIONS 70 XP Assign the poker results of Tuesday, Wednesday and Thursday to the variable poker_midweek.

Show Answer (-70 XP) ####HINT Use the vector c(2, 3, 4) between square brackets to select the correct elements of poker_vector.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIERlZmluZSBhIG5ldyB2YXJpYWJsZSBiYXNlZCBvbiBhIHNlbGVjdGlvblxucG9rZXJfbWlkd2VlayA8LSBwb2tlcl92ZWN0b3JbYygyLCAzLCA0KV0ifQ==

Vector selection: the good times (3)

Selecting multiple elements of poker_vector with c(2, 3, 4) is not very convenient. Many statisticians are lazy people by nature, so they created an easier way to do this: c(2, 3, 4) can be abbreviated to2:4, which generates a vector with all natural numbers from 2 up to 4.

So, another way to find the mid-week results is poker_vector[2:4]. Notice how the vector 2:4 is placed between the square brackets to select element 2 up to 4.

INSTRUCTIONS

70 XP Assign to roulette_selection_vector the roulette results from Tuesday up to Friday; make use of : if it makes things easier for you.

Show Answer (-70 XP) ####HINT Assign a selection of roulette_vector to roulette_selection_vector by placing 2:5 between square brackets.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIERlZmluZSBhIG5ldyB2YXJpYWJsZSBiYXNlZCBvbiBhIHNlbGVjdGlvblxucm91bGV0dGVfc2VsZWN0aW9uX3ZlY3RvciA8LSByb3VsZXR0ZV92ZWN0b3JbMjo1XSJ9

Vector selection: the good times (4)

Another way to tackle the previous exercise is by using the names of the vector elements (Monday, Tuesday, …) instead of their numeric positions. For example,

poker_vector[“Monday”] will select the first element of poker_vector since “Monday” is the name of that first element.

Just like you did in the previous exercise with numerics, you can also use the element names to select multiple elements, for example:

poker_vector[c(“Monday”,“Tuesday”)] ####INSTRUCTIONS 0 XP Select the first three elements in poker_vector by using their names: “Monday”, “Tuesday” and “Wednesday”. Assign the result of the selection to poker_start. Calculate the average of the values in poker_start with the mean() function. Simply print out the result so you can inspect it. ####HINT You can use c(“Monday”, “Tuesday”, “Wednesday”) inside square brackets to subset poker_vector appropriately. You can use mean(poker_start) to get the mean of the elements in poker_start. You do not need the mean of all poker elements, but only of the first three days.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIFNlbGVjdCBwb2tlciByZXN1bHRzIGZvciBNb25kYXksIFR1ZXNkYXkgYW5kIFdlZG5lc2RheVxucG9rZXJfc3RhcnQgPC0gcG9rZXJfdmVjdG9yW2MoXCJNb25kYXlcIiwgXCJUdWVzZGF5XCIsIFwiV2VkbmVzZGF5XCIpXVxuICBcbiMgQ2FsY3VsYXRlIHRoZSBhdmVyYWdlIG9mIHRoZSBlbGVtZW50cyBpbiBwb2tlcl9zdGFydFxubWVhbihwb2tlcl9zdGFydCkifQ==

Selection by comparison - Step 1

By making use of comparison operators, we can approach the previous question in a more proactive way.

The (logical) comparison operators known to R are:

< for less than > for greater than <= for less than or equal to >= for greater than or equal to == for equal to each other != not equal to each other As seen in the previous chapter, stating 6 > 5 returns TRUE. The nice thing about R is that you can use these comparison operators also on vectors. For example:

c(4, 5, 6) > 5 [1] FALSE FALSE TRUE This command tests for every element of the vector if the condition stated by the comparison operator is TRUE or FALSE.

INSTRUCTIONS

100 XP Check which elements in poker_vector are positive (i.e. > 0) and assign this to selection_vector. Print out selection_vector so you can inspect it. The printout tells you whether you won (TRUE) or lost (FALSE) any money for each day.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIFdoaWNoIGRheXMgZGlkIHlvdSBtYWtlIG1vbmV5IG9uIHBva2VyP1xuc2VsZWN0aW9uX3ZlY3RvciA8LSBwb2tlcl92ZWN0b3IgPiAwXG4gIFxuIyBQcmludCBvdXQgc2VsZWN0aW9uX3ZlY3Rvclxuc2VsZWN0aW9uX3ZlY3RvciJ9

Selection by comparison - Step 2

Working with comparisons will make your data analytical life easier. Instead of selecting a subset of days to investigate yourself (like before), you can simply ask R to return only those days where you realized a positive return for poker.

In the previous exercises you used selection_vector <- poker_vector > 0 to find the days on which you had a positive poker return. Now, you would like to know not only the days on which you won, but also how much you won on those days.

You can select the desired elements, by putting selection_vector between the square brackets that follow poker_vector:

poker_vector[selection_vector] R knows what to do when you pass a logical vector in square brackets: it will only select the elements that correspond to TRUE in selection_vector.

INSTRUCTIONS

70 XP Use selection_vector in square brackets to assign the amounts that you won on the profitable days to the variable poker_winning_days.

Show Answer (-70 XP) ####HINT Use poker_vector[selection_vector] to select the desired elements from poker_vector, and assign the result to poker_winning_days.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIFdoaWNoIGRheXMgZGlkIHlvdSBtYWtlIG1vbmV5IG9uIHBva2VyP1xuc2VsZWN0aW9uX3ZlY3RvciA8LSBwb2tlcl92ZWN0b3IgPiAwXG5cbiMgU2VsZWN0IGZyb20gcG9rZXJfdmVjdG9yIHRoZXNlIGRheXNcbnBva2VyX3dpbm5pbmdfZGF5cyA8LSBwb2tlcl92ZWN0b3Jbc2VsZWN0aW9uX3ZlY3Rvcl0ifQ==

Advanced selection

Just like you did for poker, you also want to know those days where you realized a positive return for roulette.

INSTRUCTIONS

70 XP Create the variable selection_vector, this time to see if you made profit with roulette for different days. Assign the amounts that you made on the days that you ended positively for roulette to the variable roulette_winning_days. This vector thus contains the positive winnings of roulette_vector. Show Answer (-70 XP) ####HINT Once you’ve correctly calculated selection_vector, you can again use roulette_vector[selection_vector] to select the positive results from roulette_vector.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBva2VyIGFuZCByb3VsZXR0ZSB3aW5uaW5ncyBmcm9tIE1vbmRheSB0byBGcmlkYXk6XG5wb2tlcl92ZWN0b3IgPC0gYygxNDAsIC01MCwgMjAsIC0xMjAsIDI0MClcbnJvdWxldHRlX3ZlY3RvciA8LSBjKC0yNCwgLTUwLCAxMDAsIC0zNTAsIDEwKVxuZGF5c192ZWN0b3IgPC0gYyhcIk1vbmRheVwiLCBcIlR1ZXNkYXlcIiwgXCJXZWRuZXNkYXlcIiwgXCJUaHVyc2RheVwiLCBcIkZyaWRheVwiKVxubmFtZXMocG9rZXJfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxubmFtZXMocm91bGV0dGVfdmVjdG9yKSA8LSBkYXlzX3ZlY3RvclxuXG4jIFdoaWNoIGRheXMgZGlkIHlvdSBtYWtlIG1vbmV5IG9uIHJvdWxldHRlP1xuc2VsZWN0aW9uX3ZlY3RvciA8LSByb3VsZXR0ZV92ZWN0b3IgPiAwXG5cbiMgU2VsZWN0IGZyb20gcm91bGV0dGVfdmVjdG9yIHRoZXNlIGRheXNcbnJvdWxldHRlX3dpbm5pbmdfZGF5cyA8LSByb3VsZXR0ZV92ZWN0b3Jbc2VsZWN0aW9uX3ZlY3Rvcl0ifQ==

Matrices

矩阵 Matrices

In this chapter you will learn how to work with matrices in R. By the end of the chapter, you will be able to create matrices and to understand how you can do basic computations with them. You will analyze the box office numbers of Star Wars to illustrate the use of matrices in R. May the force be with you!

Icon exercise interactive done What’s a matrix?

Analyze matrices, you shall

Naming a matrix

Calculating the worldwide box office

Adding a column for the Worldwide box office

Adding a row

The total box office revenue for the entire saga

Selection of matrix elements

A little arithmetic with matrices

A little arithmetic with matrices (2)

What’s a matrix? In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional.

You can construct a matrix in R with the matrix() function. Consider the following example:

matrix(1:9, byrow = TRUE, nrow = 3) In the matrix() function:

The first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, 3, 4, 5, 6, 7, 8, 9). The argument byrow indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we just place byrow = FALSE. The third argument nrow indicates that the matrix should have three rows. INSTRUCTIONS 70 XP Construct a matrix with 3 rows containing the numbers 1 up to 9, filled row-wise.

Show Answer (-70 XP) HINT Read the assignment carefully, the answer is already given!
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENvbnN0cnVjdCBhIG1hdHJpeCB3aXRoIDMgcm93cyB0aGF0IGNvbnRhaW4gdGhlIG51bWJlcnMgMSB1cCB0byA5XG5tYXRyaXgoMTo5LCBieXJvdyA9IFRSVUUsIG5yb3cgPSAzKSJ9

Analyze matrices, you shall It is now time to get your hands dirty. In the following exercises you will analyze the box office numbers of the Star Wars franchise. May the force be with you!

In the editor, three vectors are defined. Each one represents the box office numbers from the first three Star Wars movies. The first element of each vector indicates the US box office revenue, the second element refers to the Non-US box office (source: Wikipedia).

In this exercise, you’ll combine all these figures into a single vector. Next, you’ll build a matrix from this vector.

INSTRUCTIONS 100 XP Use c(new_hope, empire_strikes, return_jedi) to combine the three vectors into one vector. Call this vector box_office. Construct a matrix with 3 rows, where each row represents a movie. Use the matrix() function to do this. The first argument is the vector box_office, containing all box office figures. Next, you’ll have to specify nrow = 3 and byrow = TRUE. Name the resulting matrix star_wars_matrix.

HINT box_office <- c(new_hope, empire_strikes, return_jedi) will combine all numbers in the different vectors into a single vector with 6 elements. matrix(box_office, nrow = …, by_row …) is a template for the solution to the second instruction.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEJveCBvZmZpY2UgU3RhciBXYXJzIChpbiBtaWxsaW9ucyEpXG5uZXdfaG9wZSA8LSBjKDQ2MC45OTgsIDMxNC40KVxuZW1waXJlX3N0cmlrZXMgPC0gYygyOTAuNDc1LCAyNDcuOTAwKVxucmV0dXJuX2plZGkgPC0gYygzMDkuMzA2LCAxNjUuOClcblxuIyBDcmVhdGUgYm94X29mZmljZVxuYm94X29mZmljZSA8LSBjKG5ld19ob3BlLCBlbXBpcmVfc3RyaWtlcywgcmV0dXJuX2plZGkpXG5cbiMgQ29uc3RydWN0IHN0YXJfd2Fyc19tYXRyaXhcbnN0YXJfd2Fyc19tYXRyaXggPC0gbWF0cml4KGJveF9vZmZpY2UsIG5yb3cgPSAzLCBieXJvdyA9IFRSVUUpICJ9

Naming a matrix To help you remember what is stored in star_wars_matrix, you would like to add the names of the movies for the rows. Not only does this help you to read the data, but it is also useful to select certain elements from the matrix.

Similar to vectors, you can add names for the rows and the columns of a matrix

rownames(my_matrix) <- row_names_vector colnames(my_matrix) <- col_names_vector We went ahead and prepared two vectors for you: region, and titles. You will need these vectors to name the columns and rows of star_wars_matrix, respectively.

INSTRUCTIONS 70 XP Use colnames() to name the columns of star_wars_matrix with the region vector. Use rownames() to name the rows of star_wars_matrix with the titles vector. Print out star_wars_matrix to see the result of your work. Show Answer (-70 XP) HINT You can use colnames(star_wars_matrix) <- region to name the columns of star_wars_matrix. Do a similar thing to name the rows.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEJveCBvZmZpY2UgU3RhciBXYXJzIChpbiBtaWxsaW9ucyEpXG5uZXdfaG9wZSA8LSBjKDQ2MC45OTgsIDMxNC40KVxuZW1waXJlX3N0cmlrZXMgPC0gYygyOTAuNDc1LCAyNDcuOTAwKVxucmV0dXJuX2plZGkgPC0gYygzMDkuMzA2LCAxNjUuOClcblxuIyBDb25zdHJ1Y3QgbWF0cml4XG5zdGFyX3dhcnNfbWF0cml4IDwtIG1hdHJpeChjKG5ld19ob3BlLCBlbXBpcmVfc3RyaWtlcywgcmV0dXJuX2plZGkpLCBucm93ID0gMywgYnlyb3cgPSBUUlVFKVxuXG4jIFZlY3RvcnMgcmVnaW9uIGFuZCB0aXRsZXMsIHVzZWQgZm9yIG5hbWluZ1xucmVnaW9uIDwtIGMoXCJVU1wiLCBcIm5vbi1VU1wiKVxudGl0bGVzIDwtIGMoXCJBIE5ldyBIb3BlXCIsIFwiVGhlIEVtcGlyZSBTdHJpa2VzIEJhY2tcIiwgXCJSZXR1cm4gb2YgdGhlIEplZGlcIilcblxuIyBOYW1lIHRoZSBjb2x1bW5zIHdpdGggcmVnaW9uXG5jb2xuYW1lcyhzdGFyX3dhcnNfbWF0cml4KSA8LSByZWdpb25cblxuIyBOYW1lIHRoZSByb3dzIHdpdGggdGl0bGVzXG5yb3duYW1lcyhzdGFyX3dhcnNfbWF0cml4KSA8LSB0aXRsZXNcblxuIyBQcmludCBvdXQgc3Rhcl93YXJzX21hdHJpeFxuc3Rhcl93YXJzX21hdHJpeCJ9

Calculating the worldwide box office The single most important thing for a movie in order to become an instant legend in Tinseltown is its worldwide box office figures.

To calculate the total box office revenue for the three Star Wars movies, you have to take the sum of the US revenue column and the non-US revenue column.

In R, the function rowSums() conveniently calculates the totals for each row of a matrix. This function creates a new vector:

rowSums(my_matrix) INSTRUCTIONS 70 XP Calculate the worldwide box office figures for the three movies and put these in the vector named worldwide_vector.

Show Answer (-70 XP) HINT rowSums(star_wars_matrix) will calculate the sum of every row, so the total box office for each movie.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENvbnN0cnVjdCBzdGFyX3dhcnNfbWF0cml4XG5ib3hfb2ZmaWNlIDwtIGMoNDYwLjk5OCwgMzE0LjQsIDI5MC40NzUsIDI0Ny45MDAsIDMwOS4zMDYsIDE2NS44KVxuc3Rhcl93YXJzX21hdHJpeCA8LSBtYXRyaXgoYm94X29mZmljZSwgbnJvdyA9IDMsIGJ5cm93ID0gVFJVRSxcbiAgICAgICAgICAgICAgICAgICAgICAgICAgIGRpbW5hbWVzID0gbGlzdChjKFwiQSBOZXcgSG9wZVwiLCBcIlRoZSBFbXBpcmUgU3RyaWtlcyBCYWNrXCIsIFwiUmV0dXJuIG9mIHRoZSBKZWRpXCIpLCBcbiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjKFwiVVNcIiwgXCJub24tVVNcIikpKVxuXG4jIENhbGN1bGF0ZSB3b3JsZHdpZGUgYm94IG9mZmljZSBmaWd1cmVzXG53b3JsZHdpZGVfdmVjdG9yIDwtIHJvd1N1bXMoc3Rhcl93YXJzX21hdHJpeCkifQ==

Adding a column for the Worldwide box office In the previous exercise you calculated the vector that contained the worldwide box office receipt for each of the three Star Wars movies. However, this vector is not yet part of star_wars_matrix.

You can add a column or multiple columns to a matrix with the cbind() function, which merges matrices and/or vectors together by column. For example:

big_matrix <- cbind(matrix1, matrix2, vector1 …) INSTRUCTIONS 70 XP Add worldwide_vector as a new column to the star_wars_matrix and assign the result to all_wars_matrix. Use the cbind() function.

Show Answer (-70 XP) HINT In this exercise, you should pass two variables to cbind(): star_wars_matrix and worldwide_vector, in this order. Assign the result to all_wars_matrix.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENvbnN0cnVjdCBzdGFyX3dhcnNfbWF0cml4XG5ib3hfb2ZmaWNlIDwtIGMoNDYwLjk5OCwgMzE0LjQsIDI5MC40NzUsIDI0Ny45MDAsIDMwOS4zMDYsIDE2NS44KVxuc3Rhcl93YXJzX21hdHJpeCA8LSBtYXRyaXgoYm94X29mZmljZSwgbnJvdyA9IDMsIGJ5cm93ID0gVFJVRSxcbiAgICAgICAgICAgICAgICAgICAgICAgICAgIGRpbW5hbWVzID0gbGlzdChjKFwiQSBOZXcgSG9wZVwiLCBcIlRoZSBFbXBpcmUgU3RyaWtlcyBCYWNrXCIsIFwiUmV0dXJuIG9mIHRoZSBKZWRpXCIpLCBcbiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjKFwiVVNcIiwgXCJub24tVVNcIikpKVxuXG4jIFRoZSB3b3JsZHdpZGUgYm94IG9mZmljZSBmaWd1cmVzXG53b3JsZHdpZGVfdmVjdG9yIDwtIHJvd1N1bXMoc3Rhcl93YXJzX21hdHJpeClcblxuIyBCaW5kIHRoZSBuZXcgdmFyaWFibGUgd29ybGR3aWRlX3ZlY3RvciBhcyBhIGNvbHVtbiB0byBzdGFyX3dhcnNfbWF0cml4XG5hbGxfd2Fyc19tYXRyaXggPC0gY2JpbmQoc3Rhcl93YXJzX21hdHJpeCwgd29ybGR3aWRlX3ZlY3RvcikifQ==

Adding a row Just like every action has a reaction, every cbind() has an rbind(). (We admit, we are pretty bad with metaphors.)

Your R workspace, where all variables you defined ‘live’ (check out what a workspace is), has already been initialized and contains two matrices:

star_wars_matrix that we have used all along, with data on the original trilogy, star_wars_matrix2, with similar data for the prequels trilogy. Type the name of these matrices in the console and hit Enter if you want to have a closer look. If you want to check out the contents of the workspace, you can type ls() in the console.

INSTRUCTIONS 100 XP Use rbind() to paste together star_wars_matrix and star_wars_matrix2, in this order. Assign the resulting matrix to all_wars_matrix.

HINT Bind the two matrices together like this:

rbind(matrix1, matrix2) Assign the result to all_wars_matrix.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHN0YXJfd2Fyc19tYXRyaXggYW5kIHN0YXJfd2Fyc19tYXRyaXgyIGFyZSBhdmFpbGFibGUgaW4geW91ciB3b3Jrc3BhY2VcbnN0YXJfd2Fyc19tYXRyaXggIFxuc3Rhcl93YXJzX21hdHJpeDIgXG5cbiMgQ29tYmluZSBib3RoIFN0YXIgV2FycyB0cmlsb2dpZXMgaW4gb25lIG1hdHJpeFxuYWxsX3dhcnNfbWF0cml4IDwtIHJiaW5kKHN0YXJfd2Fyc19tYXRyaXgsIHN0YXJfd2Fyc19tYXRyaXgyKSJ9

The total box office revenue for the entire saga Just like cbind() has rbind(), colSums() has rowSums(). Your R workspace already contains the all_wars_matrix that you constructed in the previous exercise; type all_wars_matrix to have another look. Let’s now calculate the total box office revenue for the entire saga.

INSTRUCTIONS 70 XP Calculate the total revenue for the US and the non-US region and assign total_revenue_vector. You can use the colSums() function. Print out total_revenue_vector to have a look at the results. Show Answer (-70 XP) HINT You should use the colSums() function with star_wars_matrix as the argument to find the total box office per region.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIGFsbF93YXJzX21hdHJpeCBpcyBhdmFpbGFibGUgaW4geW91ciB3b3Jrc3BhY2VcbmFsbF93YXJzX21hdHJpeFxuXG4jIFRvdGFsIHJldmVudWUgZm9yIFVTIGFuZCBub24tVVNcbnRvdGFsX3JldmVudWVfdmVjdG9yIDwtIGNvbFN1bXMoYWxsX3dhcnNfbWF0cml4KVxuXG4jIFByaW50IG91dCB0b3RhbF9yZXZlbnVlX3ZlY3RvclxudG90YWxfcmV2ZW51ZV92ZWN0b3IifQ==

Selection of matrix elements Similar to vectors, you can use the square brackets to select one or multiple elements from a matrix. Whereas vectors have one dimension, matrices have two dimensions. You should therefore use a comma to separate the rows you want to select from the columns. For example:

my_matrix[1,2] selects the element at the first row and second column. my_matrix[1:3,2:4] results in a matrix with the data on the rows 1, 2, 3 and columns 2, 3, 4. If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively:

my_matrix[,1] selects all elements of the first column. my_matrix[1,] selects all elements of the first row. Back to Star Wars with this newly acquired knowledge! As in the previous exercise, all_wars_matrix is already available in your workspace.

INSTRUCTIONS 70 XP Select the non-US revenue for all movies (the entire second column of all_wars_matrix), store the result as non_us_all. Use mean() on non_us_all to calculate the average non-US revenue for all movies. Simply print out the result. This time, select the non-US revenue for the first two movies in all_wars_matrix. Store the result as non_us_some. Use mean() again to print out the average of the values in non_us_some. Show Answer (-70 XP) HINT You can select the entire second column of a matrix my_matrix with my_matrix[,2].
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIGFsbF93YXJzX21hdHJpeCBpcyBhdmFpbGFibGUgaW4geW91ciB3b3Jrc3BhY2VcbmFsbF93YXJzX21hdHJpeFxuXG4jIFNlbGVjdCB0aGUgbm9uLVVTIHJldmVudWUgZm9yIGFsbCBtb3ZpZXNcbm5vbl91c19hbGwgPC0gYWxsX3dhcnNfbWF0cml4WywyXVxuICBcbiMgQXZlcmFnZSBub24tVVMgcmV2ZW51ZVxubWVhbihub25fdXNfYWxsKVxuICBcbiMgU2VsZWN0IHRoZSBub24tVVMgcmV2ZW51ZSBmb3IgZmlyc3QgdHdvIG1vdmllc1xubm9uX3VzX3NvbWUgPC0gYWxsX3dhcnNfbWF0cml4WzE6MiwyXVxuICBcbiMgQXZlcmFnZSBub24tVVMgcmV2ZW51ZSBmb3IgZmlyc3QgdHdvIG1vdmllc1xubWVhbihub25fdXNfc29tZSkifQ==

A little arithmetic with matrices Similar to what you have learned with vectors, the standard operators like +, -, /, *, etc. work in an element-wise way on matrices in R.

For example, 2 * my_matrix multiplies each element of my_matrix by two.

As a newly-hired data analyst for Lucasfilm, it is your job to find out how many visitors went to each movie for each geographical area. You already have the total revenue figures in all_wars_matrix. Assume that the price of a ticket was 5 dollars. Simply dividing the box office numbers by this ticket price gives you the number of visitors.

INSTRUCTIONS 70 XP Divide all_wars_matrix by 5, giving you the number of visitors in millions. Assign the resulting matrix to visitors. Print out visitors so you can have a look. Show Answer (-70 XP) HINT The number of visitors is equal to all_wars_matrix divided by 5.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIGFsbF93YXJzX21hdHJpeCBpcyBhdmFpbGFibGUgaW4geW91ciB3b3Jrc3BhY2VcbmFsbF93YXJzX21hdHJpeFxuXG4jIEVzdGltYXRlIHRoZSB2aXNpdG9yc1xudmlzaXRvcnMgPC0gYWxsX3dhcnNfbWF0cml4IC8gNVxuXG4jIFByaW50IHRoZSBlc3RpbWF0ZSB0byB0aGUgY29uc29sZVxudmlzaXRvcnMifQ==

A little arithmetic with matrices (2) Just like 2 * my_matrix multiplied every element of my_matrix by two, my_matrix1 * my_matrix2 creates a matrix where each element is the product of the corresponding elements in my_matrix1 and my_matrix2.

After looking at the result of the previous exercise, big boss Lucas points out that the ticket prices went up over time. He asks to redo the analysis based on the prices you can find in ticket_prices_matrix (source: imagination).

Those who are familiar with matrices should note that this is not the standard matrix multiplication for which you should use %*% in R.

INSTRUCTIONS 70 XP Divide all_wars_matrix by ticket_prices_matrix to get the estimated number of US and non-US visitors for the six movies. Assign the result to visitors. From the visitors matrix, select the entire first column, representing the number of visitors in the US. Store this selection as us_visitors. Calculate the average number of US visitors; print out the result. Show Answer (-70 XP) HINT You can use the function mean() to calculate the average of the inputs to the function. To get the number of visitors in the US, select the first column from visitors using visitors[ ,1].
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIGFsbF93YXJzX21hdHJpeCBhbmQgdGlja2V0X3ByaWNlc19tYXRyaXggYXJlIGF2YWlsYWJsZSBpbiB5b3VyIHdvcmtzcGFjZVxuYWxsX3dhcnNfbWF0cml4XG50aWNrZXRfcHJpY2VzX21hdHJpeFxuXG4jIEVzdGltYXRlZCBudW1iZXIgb2YgdmlzaXRvcnNcbnZpc2l0b3JzIDwtIGFsbF93YXJzX21hdHJpeCAvIHRpY2tldF9wcmljZXNfbWF0cml4XG5cbiMgVVMgdmlzaXRvcnNcbnVzX3Zpc2l0b3JzIDwtIHZpc2l0b3JzWyAsMV1cblxuIyBBdmVyYWdlIG51bWJlciBvZiBVUyB2aXNpdG9yc1xubWVhbih1c192aXNpdG9ycykifQ==

What’s a factor and why would you use it? In this chapter you dive into the wonderful world of factors.

The term factor refers to a statistical data type used to store categorical variables. The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values.

It is important that R knows whether it is dealing with a continuous or a categorical variable, as the statistical models you will develop in the future treat both types differently. (You will see later why this is the case.)

A good example of a categorical variable is sex. In many circumstances you can limit the sex categories to “Male” or “Female”. (Sometimes you may need different categories. For example, you may need to consider chromosomal variation, hermaphroditic animals, or different cultural norms, but you will always have a finite number of categories.)

INSTRUCTIONS 70 XP Assign to variable theory the value “factors for categorical variables”.

Show Answer (-70 XP) HINT Simply assign a variable (<-); make sure to capitalize correctly.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFzc2lnbiB0byB0aGUgdmFyaWFibGUgdGhlb3J5IHdoYXQgdGhpcyBjaGFwdGVyIGlzIGFib3V0IVxudGhlb3J5IDwtIFwiZmFjdG9ycyBmb3IgY2F0ZWdvcmljYWwgdmFyaWFibGVzXCIifQ==

What’s a factor and why would you use it? (2) To create factors in R, you make use of the function factor(). First thing that you have to do is create a vector that contains all the observations that belong to a limited number of categories. For example, sex_vector contains the sex of 5 different individuals:

sex_vector <- c(“Male”,“Female”,“Female”,“Male”,“Male”) It is clear that there are two categories, or in R-terms ‘factor levels’, at work here: “Male” and “Female”.

The function factor() will encode the vector as a factor:

factor_sex_vector <- factor(sex_vector) INSTRUCTIONS 70 XP Convert the character vector sex_vector to a factor with factor() and assign the result to factor_sex_vector Print out factor_sex_vector and assert that R prints out the factor levels below the actual values. Show Answer (-70 XP) HINT Simply use the function factor() on sex_vector. Have a look at the assignment, the answer is already there somewhere…
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNleCB2ZWN0b3JcbnNleF92ZWN0b3IgPC0gYyhcIk1hbGVcIiwgXCJGZW1hbGVcIiwgXCJGZW1hbGVcIiwgXCJNYWxlXCIsIFwiTWFsZVwiKVxuXG4jIENvbnZlcnQgc2V4X3ZlY3RvciB0byBhIGZhY3RvclxuZmFjdG9yX3NleF92ZWN0b3IgPC0gZmFjdG9yKHNleF92ZWN0b3IpXG5cbiMgUHJpbnQgb3V0IGZhY3Rvcl9zZXhfdmVjdG9yXG5mYWN0b3Jfc2V4X3ZlY3RvciJ9

Factors

分类变量 Factors

Very often, data falls into a limited number of categories. For example, human hair color can be categorized as black/brown/blonde/red/grey/white (and perhaps a few more options for people who dye it). In R, categorical data is stored in factors. Given the importance of these factors in data analysis, you should start learning how to create, subset and compare them now!

What’s a factor and why would you use it?

What’s a factor and why would you use it? (2)

What’s a factor and why would you use it? (3)

Factor levels

Summarizing a factor

Battle of the sexes

Ordered factors

Ordered factors (2)

Comparing ordered factors

What’s a factor and why would you use it? (3) There are two types of categorical variables: a nominal categorical variable and an ordinal categorical variable.

A nominal variable is a categorical variable without an implied order. This means that it is impossible to say that ‘one is worth more than the other’. For example, think of the categorical variable animals_vector with the categories “Elephant”, “Giraffe”, “Donkey” and “Horse”. Here, it is impossible to say that one stands above or below the other. (Note that some of you might disagree ;-) ).

In contrast, ordinal variables do have a natural ordering. Consider for example the categorical variable temperature_vector with the categories: “Low”, “Medium” and “High”. Here it is obvious that “Medium” stands above “Low”, and “High” stands above “Medium”.

INSTRUCTIONS 70 XP Click ‘Submit Answer’ to check how R constructs and prints nominal and ordinal variables. Do not worry if you do not understand all the code just yet, we will get to that.

Show Answer (-70 XP) HINT Just click the ‘Submit Answer’ button and look at the console. Notice how R indicates the ordering of the factor levels for ordinal categorical variables.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFuaW1hbHNcbmFuaW1hbHNfdmVjdG9yIDwtIGMoXCJFbGVwaGFudFwiLCBcIkdpcmFmZmVcIiwgXCJEb25rZXlcIiwgXCJIb3JzZVwiKVxuZmFjdG9yX2FuaW1hbHNfdmVjdG9yIDwtIGZhY3RvcihhbmltYWxzX3ZlY3RvcilcbmZhY3Rvcl9hbmltYWxzX3ZlY3RvclxuXG4jIFRlbXBlcmF0dXJlXG50ZW1wZXJhdHVyZV92ZWN0b3IgPC0gYyhcIkhpZ2hcIiwgXCJMb3dcIiwgXCJIaWdoXCIsXCJMb3dcIiwgXCJNZWRpdW1cIilcbmZhY3Rvcl90ZW1wZXJhdHVyZV92ZWN0b3IgPC0gZmFjdG9yKHRlbXBlcmF0dXJlX3ZlY3Rvciwgb3JkZXIgPSBUUlVFLCBsZXZlbHMgPSBjKFwiTG93XCIsIFwiTWVkaXVtXCIsIFwiSGlnaFwiKSlcbmZhY3Rvcl90ZW1wZXJhdHVyZV92ZWN0b3IifQ==

Factor levels When you first get a data set, you will often notice that it contains factors with specific factor levels. However, sometimes you will want to change the names of these levels for clarity or other reasons. R allows you to do this with the function levels():

levels(factor_vector) <- c(“name1”, “name2”,…) A good illustration is the raw data that is provided to you by a survey. A common question for every questionnaire is the sex of the respondent. Here, for simplicity, just two categories were recorded, “M” and “F”. (You usually need more categories for survey data; either way, you use a factor to store the categorical data.)

survey_vector <- c(“M”, “F”, “F”, “M”, “M”) Recording the sex with the abbreviations “M” and “F” can be convenient if you are collecting data with pen and paper, but it can introduce confusion when analyzing the data. At that point, you will often want to change the factor levels to “Male” and “Female” instead of “M” and “F” for clarity.

Watch out: the order with which you assign the levels is important. If you type levels(factor_survey_vector), you’ll see that it outputs [1] “F” “M”. If you don’t specify the levels of the factor when creating the vector, R will automatically assign them alphabetically. To correctly map “F” to “Female” and “M” to “Male”, the levels should be set to c(“Female”, “Male”), in this order.

INSTRUCTIONS 70 XP INSTRUCTIONS 70 XP Check out the code that builds a factor vector from survey_vector. You should use factor_survey_vector in the next instruction. Change the factor levels of factor_survey_vector to c(“Female”, “Male”). Mind the order of the vector elements here. Show Answer (-70 XP) HINT Mind the order in which you have to type in the factor levels. Hint: look at the order in which the levels are printed when typing levels(factor_survey_vector).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENvZGUgdG8gYnVpbGQgZmFjdG9yX3N1cnZleV92ZWN0b3JcbnN1cnZleV92ZWN0b3IgPC0gYyhcIk1cIiwgXCJGXCIsIFwiRlwiLCBcIk1cIiwgXCJNXCIpXG5mYWN0b3Jfc3VydmV5X3ZlY3RvciA8LSBmYWN0b3Ioc3VydmV5X3ZlY3RvcilcblxuIyBTcGVjaWZ5IHRoZSBsZXZlbHMgb2YgZmFjdG9yX3N1cnZleV92ZWN0b3JcbmxldmVscyhmYWN0b3Jfc3VydmV5X3ZlY3RvcikgPC0gYyhcIkZlbWFsZVwiLCBcIk1hbGVcIilcblxuZmFjdG9yX3N1cnZleV92ZWN0b3IifQ==

Summarizing a factor After finishing this course, one of your favorite functions in R will be summary(). This will give you a quick overview of the contents of a variable:

summary(my_var) Going back to our survey, you would like to know how many “Male” responses you have in your study, and how many “Female” responses. The summary() function gives you the answer to this question.

INSTRUCTIONS 70 XP INSTRUCTIONS 70 XP Ask a summary() of the survey_vector and factor_survey_vector. Interpret the results of both vectors. Are they both equally useful in this case?

Show Answer (-70 XP) HINT Call the summary() function on both survey_vector and factor_survey_vector, it’s as simple as that!
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEJ1aWxkIGZhY3Rvcl9zdXJ2ZXlfdmVjdG9yIHdpdGggY2xlYW4gbGV2ZWxzXG5zdXJ2ZXlfdmVjdG9yIDwtIGMoXCJNXCIsIFwiRlwiLCBcIkZcIiwgXCJNXCIsIFwiTVwiKVxuZmFjdG9yX3N1cnZleV92ZWN0b3IgPC0gZmFjdG9yKHN1cnZleV92ZWN0b3IpXG5sZXZlbHMoZmFjdG9yX3N1cnZleV92ZWN0b3IpIDwtIGMoXCJGZW1hbGVcIiwgXCJNYWxlXCIpXG5mYWN0b3Jfc3VydmV5X3ZlY3RvclxuXG4jIEdlbmVyYXRlIHN1bW1hcnkgZm9yIHN1cnZleV92ZWN0b3JcbnN1bW1hcnkoc3VydmV5X3ZlY3RvcilcblxuIyBHZW5lcmF0ZSBzdW1tYXJ5IGZvciBmYWN0b3Jfc3VydmV5X3ZlY3Rvclxuc3VtbWFyeShmYWN0b3Jfc3VydmV5X3ZlY3RvcikifQ==

Have a look at the output. The fact that you identified “Male” and “Female” as factor levels in factor_survey_vector enables R to show the number of elements for each category.

Battle of the sexes You might wonder what happens when you try to compare elements of a factor. In factor_survey_vector you have a factor with two levels: “Male” and “Female”. But how does R value these relative to each other?

INSTRUCTIONS 70 XP Read the code in the editor and click ‘Submit Answer’ to test if male is greater than (>) female.

Show Answer (-70 XP) HINT Just click ‘Submit Answer’ and have a look at output that gets printed to the console.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEJ1aWxkIGZhY3Rvcl9zdXJ2ZXlfdmVjdG9yIHdpdGggY2xlYW4gbGV2ZWxzXG5zdXJ2ZXlfdmVjdG9yIDwtIGMoXCJNXCIsIFwiRlwiLCBcIkZcIiwgXCJNXCIsIFwiTVwiKVxuZmFjdG9yX3N1cnZleV92ZWN0b3IgPC0gZmFjdG9yKHN1cnZleV92ZWN0b3IpXG5sZXZlbHMoZmFjdG9yX3N1cnZleV92ZWN0b3IpIDwtIGMoXCJGZW1hbGVcIiwgXCJNYWxlXCIpXG5cbiMgTWFsZVxubWFsZSA8LSBmYWN0b3Jfc3VydmV5X3ZlY3RvclsxXVxuXG4jIEZlbWFsZVxuZmVtYWxlIDwtIGZhY3Rvcl9zdXJ2ZXlfdmVjdG9yWzJdXG5cbiMgQmF0dGxlIG9mIHRoZSBzZXhlczogTWFsZSAnbGFyZ2VyJyB0aGFuIGZlbWFsZT9cbm1hbGUgPiBmZW1hbGUifQ==

Ordered factors Since “Male” and “Female” are unordered (or nominal) factor levels, R returns a warning message, telling you that the greater than operator is not meaningful. As seen before, R attaches an equal value to the levels for such factors.

But this is not always the case! Sometimes you will also deal with factors that do have a natural ordering between its categories. If this is the case, we have to make sure that we pass this information to R…

Let us say that you are leading a research team of five data analysts and that you want to evaluate their performance. To do this, you track their speed, evaluate each analyst as “slow”, “medium” or “fast”, and save the results in speed_vector.

INSTRUCTIONS 70 XP As a first step, assign speed_vector a vector with 5 entries, one for each analyst. Each entry should be either “slow”, “medium”, or “fast”. Use the list below:

Analyst 1 is medium, Analyst 2 is slow, Analyst 3 is slow, Analyst 4 is medium and Analyst 5 is fast. No need to specify these are factors yet.

Show Answer (-70 XP) HINT Assign to speed_vector a vector containing the character strings “slow”, “medium”, or “fast”.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENyZWF0ZSBzcGVlZF92ZWN0b3JcbnNwZWVkX3ZlY3RvciA8LSBjKFwibWVkaXVtXCIsIFwic2xvd1wiLCBcInNsb3dcIiwgXCJtZWRpdW1cIiwgXCJmYXN0XCIpIn0=

Ordered factors (2) speed_vector should be converted to an ordinal factor since its categories have a natural ordering. By default, the function factor() transforms speed_vector into an unordered factor. To create an ordered factor, you have to add two additional arguments: ordered and levels.

factor(some_vector, ordered = TRUE, levels = c(“lev1”, “lev2” …)) By setting the argument ordered to TRUE in the function factor(), you indicate that the factor is ordered. With the argument levels you give the values of the factor in the correct order.

INSTRUCTIONS 70 XP From speed_vector, create an ordered factor vector: factor_speed_vector. Set ordered to TRUE, and set levels to c(“slow”, “medium”, “fast”).

Show Answer (-70 XP) HINT Use the function factor() to create factor_speed_vector based on speed_character_vector. The argument ordered should be set to TRUE since there is a natural ordering. Also, set levels = c(“slow”, “medium”, “fast”).
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENyZWF0ZSBzcGVlZF92ZWN0b3JcbnNwZWVkX3ZlY3RvciA8LSBjKFwibWVkaXVtXCIsIFwic2xvd1wiLCBcInNsb3dcIiwgXCJtZWRpdW1cIiwgXCJmYXN0XCIpXG5cbiMgQWRkIHlvdXIgY29kZSBiZWxvd1xuZmFjdG9yX3NwZWVkX3ZlY3RvciA8LSBmYWN0b3Ioc3BlZWRfdmVjdG9yLCBvcmRlcmVkID0gVFJVRSwgbGV2ZWxzID0gYyhcInNsb3dcIiwgXCJtZWRpdW1cIiwgXCJmYXN0XCIpKVxuXG4jIFByaW50IGZhY3Rvcl9zcGVlZF92ZWN0b3JcbmZhY3Rvcl9zcGVlZF92ZWN0b3JcbnN1bW1hcnkoZmFjdG9yX3NwZWVkX3ZlY3RvcikifQ==

Comparing ordered factors Having a bad day at work, ‘data analyst number two’ enters your office and starts complaining that ‘data analyst number five’ is slowing down the entire project. Since you know that ‘data analyst number two’ has the reputation of being a smarty-pants, you first decide to check if his statement is true.

The fact that factor_speed_vector is now ordered enables us to compare different elements (the data analysts in this case). You can simply do this by using the well-known operators.

INSTRUCTIONS 70 XP Use [2] to select from factor_speed_vector the factor value for the second data analyst. Store it as da2. Use [5] to select the factor_speed_vector factor value for the fifth data analyst. Store it as da5. Check if da2 is greater than da5; simply print out the result. Remember that you can use the > operator to check whether one element is larger than the other. Show Answer (-70 XP) HINT To select the factor value for the third data analyst, you’d need factor_speed_vector[3]. To compare two values, you can use >. For example: da3 > da4.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENyZWF0ZSBmYWN0b3Jfc3BlZWRfdmVjdG9yXG5zcGVlZF92ZWN0b3IgPC0gYyhcIm1lZGl1bVwiLCBcInNsb3dcIiwgXCJzbG93XCIsIFwibWVkaXVtXCIsIFwiZmFzdFwiKVxuZmFjdG9yX3NwZWVkX3ZlY3RvciA8LSBmYWN0b3Ioc3BlZWRfdmVjdG9yLCBvcmRlcmVkID0gVFJVRSwgbGV2ZWxzID0gYyhcInNsb3dcIiwgXCJtZWRpdW1cIiwgXCJmYXN0XCIpKVxuXG4jIEZhY3RvciB2YWx1ZSBmb3Igc2Vjb25kIGRhdGEgYW5hbHlzdFxuZGEyIDwtIGZhY3Rvcl9zcGVlZF92ZWN0b3JbMl1cblxuIyBGYWN0b3IgdmFsdWUgZm9yIGZpZnRoIGRhdGEgYW5hbHlzdFxuZGE1IDwtIGZhY3Rvcl9zcGVlZF92ZWN0b3JbNV1cblxuIyBJcyBkYXRhIGFuYWx5c3QgMiBmYXN0ZXIgZGF0YSBhbmFseXN0IDU/XG5kYTIgPiBkYTUifQ==

What do the result tell you? Data analyst two is complaining about the data analyst five while in fact they are the one slowing everything down! This concludes the chapter on factors. With a solid basis in vectors, matrices and factors, you’re ready to dive into the wonderful world of data frames, a very important data structure in R!

Data Frame

数据框 Data frames

Most data sets you will be working with will be stored as data frames. By the end of this chapter focused on R basics, you will be able to create a data frame, select interesting parts of a data frame and order a data frame according to certain variables.

What’s a data frame?

Quick, have a look at your data set

Have a look at the structure

Creating a data frame

Creating a data frame (2)

Selection of data frame elements

Selection of data frame elements (2)

Only planets with rings

Only planets with rings (2)

Only planets with rings but shorter

Sorting

Sorting your data frame

What’s a data frame? You may remember from the chapter about matrices that all the elements that you put in a matrix should be of the same type. Back then, your data set on Star Wars only contained numeric elements.

When doing a market research survey, however, you often have questions such as:

‘Are you married?’ or ‘yes/no’ questions (logical) ‘How old are you?’ (numeric) ‘What is your opinion on this product?’ or other ‘open-ended’ questions (character) … The output, namely the respondents’ answers to the questions formulated above, is a data set of different data types. You will often find yourself working with data sets that contain different data types instead of only one.

A data frame has the variables of a data set as columns and the observations as rows. This will be a familiar concept for those coming from different statistical software packages such as SAS or SPSS.

INSTRUCTIONS 70 XP Click ‘Submit Answer’. The data from the built-in example data frame mtcars will be printed to the console.

Show Answer (-70 XP) HINT Just click ‘Submit Answer’ and witness the magic!
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFByaW50IG91dCBidWlsdC1pbiBSIGRhdGEgZnJhbWVcbm10Y2FycyAifQ==

Quick, have a look at your data set Wow, that is a lot of cars!

Working with large data sets is not uncommon in data analysis. When you work with (extremely) large data sets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire data set.

So how to do this in R? Well, the function head() enables you to show the first observations of a data frame. Similarly, the function tail() prints out the last observations in your data set.

Both head() and tail() print a top line called the ‘header’, which contains the names of the different variables in your data set.

INSTRUCTIONS 70 XP Call head() on the mtcars data set to have a look at the header and the first observations.

Show Answer (-70 XP) HINT head(mtcars) will show the first observations of the mtcars data frame.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENhbGwgaGVhZCgpIG9uIG10Y2Fyc1xuaGVhZChtdGNhcnMpIn0=

So, what do we have in this data set? For example, hp represents the car’s horsepower; the Datsun has the lowest horse power of the 6 cars that are displayed. For a full overview of the variables’ meaning, type ?mtcars in the console and read the help page. Continue to the next exercise!

Have a look at the structure Another method that is often used to get a rapid overview of your data is the function str(). The function str() shows you the structure of your data set. For a data frame it tells you:

The total number of observations (e.g. 32 car types) The total number of variables (e.g. 11 car features) A full list of the variables names (e.g. mpg, cyl … ) The data type of each variable (e.g. num) The first observations Applying the str() function will often be the first thing that you do when receiving a new data set or data frame. It is a great way to get more insight in your data set before diving into the real analysis.

INSTRUCTIONS 70 XP Investigate the structure of mtcars. Make sure that you see the same numbers, variables and data types as mentioned above.

Show Answer (-70 XP) HINT Use the str() function on mtcars.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEludmVzdGlnYXRlIHRoZSBzdHJ1Y3R1cmUgb2YgbXRjYXJzXG5zdHIobXRjYXJzKSJ9

Creating a data frame Since using built-in data sets is not even half the fun of creating your own data sets, the rest of this chapter is based on your personally developed data set. Put your jet pack on because it is time for some space exploration!

As a first goal, you want to construct a data frame that describes the main characteristics of eight planets in our solar system. According to your good friend Buzz, the main features of a planet are:

The type of planet (Terrestrial or Gas Giant). The planet’s diameter relative to the diameter of the Earth. The planet’s rotation across the sun relative to that of the Earth. If the planet has rings or not (TRUE or FALSE). After doing some high-quality research on Wikipedia, you feel confident enough to create the necessary vectors: name, type, diameter, rotation and rings; these vectors have already been coded up on the right. The first element in each of these vectors correspond to the first observation.

You construct a data frame with the data.frame() function. As arguments, you pass the vectors from before: they will become the different columns of your data frame. Because every column has the same length, the vectors you pass should also have the same length. But don’t forget that it is possible (and likely) that they contain different types of data.

INSTRUCTIONS 70 XP INSTRUCTIONS 70 XP Use the function data.frame() to construct a data frame. Pass the vectors name, type, diameter, rotation and rings as arguments to data.frame(), in this order. Call the resulting data frame planets_df.

Show Answer (-70 XP) HINT Your data.frame() call starts as follows:

data.frame(planets, type, diameter)

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluaXRpb24gb2YgdmVjdG9yc1xubmFtZSA8LSBjKFwiTWVyY3VyeVwiLCBcIlZlbnVzXCIsIFwiRWFydGhcIiwgXCJNYXJzXCIsIFwiSnVwaXRlclwiLCBcIlNhdHVyblwiLCBcIlVyYW51c1wiLCBcIk5lcHR1bmVcIilcbnR5cGUgPC0gYyhcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcbiAgICAgICAgICBcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcIkdhcyBnaWFudFwiLCBcIkdhcyBnaWFudFwiLCBcIkdhcyBnaWFudFwiLCBcIkdhcyBnaWFudFwiKVxuZGlhbWV0ZXIgPC0gYygwLjM4MiwgMC45NDksIDEsIDAuNTMyLCAxMS4yMDksIDkuNDQ5LCA0LjAwNywgMy44ODMpXG5yb3RhdGlvbiA8LSBjKDU4LjY0LCAtMjQzLjAyLCAxLCAxLjAzLCAwLjQxLCAwLjQzLCAtMC43MiwgMC42NylcbnJpbmdzIDwtIGMoRkFMU0UsIEZBTFNFLCBGQUxTRSwgRkFMU0UsIFRSVUUsIFRSVUUsIFRSVUUsIFRSVUUpXG5cbiMgQ3JlYXRlIGEgZGF0YSBmcmFtZSBmcm9tIHRoZSB2ZWN0b3JzXG5wbGFuZXRzX2RmIDwtIGRhdGEuZnJhbWUobmFtZSwgdHlwZSwgZGlhbWV0ZXIsIHJvdGF0aW9uLCByaW5ncykifQ==

Creating a data frame (2) The planets_df data frame should have 8 observations and 5 variables. It has been made available in the workspace, so you can directly use it.

INSTRUCTIONS 70 XP Use str() to investigate the structure of the new planets_df variable.

Show Answer (-70 XP) HINT planets_df is already available in your workspace, so str(planets_df) will do the trick.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENoZWNrIHRoZSBzdHJ1Y3R1cmUgb2YgcGxhbmV0c19kZlxuc3RyKHBsYW5ldHNfZGYpIn0=

Now that you have a clear understanding of the planets_df data set, it’s time to see how you can select elements from it. Learn all about in the next exercises!

Selection of data frame elements Similar to vectors and matrices, you select elements from a data frame with the help of square brackets . By using a comma, you can indicate what to select from the rows and the columns respectively. For example:

my_df[1,2] selects the value at the first row and second column in my_df. my_df[1:3,2:4] selects rows 1, 2, 3 and columns 2, 3, 4 in my_df. Sometimes you want to select all elements of a row or column. For example, my_df[1, ] selects all elements of the first row. Let us now apply this technique on planets_df!

INSTRUCTIONS 70 XP From planets_df, select the diameter of Mercury: this is the value at the first row and the third column. Simply print out the result. From planets_df, select all data on Mars (the fourth row). Simply print out the result. Show Answer (-70 XP) HINT To select the diameter for Venus (the second row), you would need: planets_df[2,3]. What do you need for Mercury then?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSBwbGFuZXRzX2RmIGRhdGEgZnJhbWUgZnJvbSB0aGUgcHJldmlvdXMgZXhlcmNpc2UgaXMgcHJlLWxvYWRlZFxuXG4jIFByaW50IG91dCBkaWFtZXRlciBvZiBNZXJjdXJ5IChyb3cgMSwgY29sdW1uIDMpXG5wbGFuZXRzX2RmWzEsM11cblxuIyBQcmludCBvdXQgZGF0YSBmb3IgTWFycyAoZW50aXJlIGZvdXJ0aCByb3cpXG5wbGFuZXRzX2RmWzQsIF0ifQ==

Apart from selecting elements from your data frame by index, you can also use the column names. To learn how, head over to the next exercise.

Selection of data frame elements (2) Instead of using numerics to select elements of a data frame, you can also use the variable names to select columns of a data frame.

Suppose you want to select the first three elements of the type column. One way to do this is

planets_df[1:3,2] A possible disadvantage of this approach is that you have to know (or look up) the column number of type, which gets hard if you have a lot of variables. It is often easier to just make use of the variable name:

planets_df[1:3,“type”] INSTRUCTIONS 70 XP Select and print out the first 5 values in the “diameter” column of planets_df.

Show Answer (-70 XP) HINT You can select the first five values with planets_df[1:5, …]. Can you fill in the … bit to only select the “diameter” column?
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSBwbGFuZXRzX2RmIGRhdGEgZnJhbWUgZnJvbSB0aGUgcHJldmlvdXMgZXhlcmNpc2UgaXMgcHJlLWxvYWRlZFxuXG4jIFNlbGVjdCBmaXJzdCA1IHZhbHVlcyBvZiBkaWFtZXRlciBjb2x1bW5cbnBsYW5ldHNfZGZbMTo1LCBcImRpYW1ldGVyXCJdIn0=

Only planets with rings You will often want to select an entire column, namely one specific variable from a data frame. If you want to select all elements of the variable diameter, for example, both of these will do the trick:

planets_df[,3] planets_df[,“diameter”] However, there is a short-cut. If your columns have names, you can use the $ sign:

planets_df$diameter INSTRUCTIONS 70 XP Use the $ sign to select the rings variable from planets_df. Store the vector that results as rings_vector. Print out rings_vector to see if you got it right. Show Answer (-70 XP) HINT planets_df$diameter selects the diameter column from planets_df; what do you need to select the rings column then?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHBsYW5ldHNfZGYgaXMgcHJlLWxvYWRlZCBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIFNlbGVjdCB0aGUgcmluZ3MgdmFyaWFibGUgZnJvbSBwbGFuZXRzX2RmXG5yaW5nc192ZWN0b3IgPC0gcGxhbmV0c19kZiRyaW5nc1xuXG4jIFByaW50IG91dCByaW5nc192ZWN0b3JcbnJpbmdzX3ZlY3RvciJ9

Only planets with rings (2) You probably remember from high school that some planets in our solar system have rings and others do not. Unfortunately you can not recall their names. Could R help you out?

If you type rings_vector in the console, you get:

[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE This means that the first four observations (or planets) do not have a ring (FALSE), but the other four do (TRUE). However, you do not get a nice overview of the names of these planets, their diameter, etc. Let’s try to use rings_vector to select the data for the four planets with rings.

INSTRUCTIONS 70 XP The code on the right selects the name column of all planets that have rings. Adapt the code so that instead of only the name column, all columns for planets that have rings are selected.

Show Answer (-70 XP) HINT Remember that to select all columns, you simply have to leave the columns part inside the empty! This means you’ll need [rings_vector, ].

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHBsYW5ldHNfZGYgYW5kIHJpbmdzX3ZlY3RvciBhcmUgcHJlLWxvYWRlZCBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIEFkYXB0IHRoZSBjb2RlIHRvIHNlbGVjdCBhbGwgY29sdW1ucyBmb3IgcGxhbmV0cyB3aXRoIHJpbmdzXG5wbGFuZXRzX2RmW3JpbmdzX3ZlY3RvciwgXSJ9

Only planets with rings but shorter So what exactly did you learn in the previous exercises? You selected a subset from a data frame (planets_df) based on whether or not a certain condition was true (rings or no rings), and you managed to pull out all relevant data. Pretty awesome! By now, NASA is probably already flirting with your CV ;-).

Now, let us move up one level and use the function subset(). You should see the subset() function as a short-cut to do exactly the same as what you did in the previous exercises.

subset(my_df, subset = some_condition) The first argument of subset() specifies the data set for which you want a subset. By adding the second argument, you give R the necessary information and conditions to select the correct subset.

The code below will give the exact same result as you got in the previous exercise, but this time, you didn’t need the rings_vector!

subset(planets_df, subset = rings) INSTRUCTIONS 70 XP Use subset() on planets_df to select planets that have a diameter smaller than Earth. Because the diameter variable is a relative measure of the planet’s diameter w.r.t that of planet Earth, your condition is diameter < 1.

Show Answer (-70 XP) HINT subset(planets_df, subset = …) almost solves it; can you fill in the …?

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHBsYW5ldHNfZGYgaXMgcHJlLWxvYWRlZCBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIFNlbGVjdCBwbGFuZXRzIHdpdGggZGlhbWV0ZXIgPCAxXG5zdWJzZXQocGxhbmV0c19kZiwgc3Vic2V0ID0gZGlhbWV0ZXIgPCAxKSJ9

Sorting Making and creating rankings is one of mankind’s favorite affairs. These rankings can be useful (best universities in the world), entertaining (most influential movie stars) or pointless (best 007 look-a-like).

In data analysis you can sort your data according to a certain variable in the data set. In R, this is done with the help of the function order().

order() is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example:

a <- c(100, 10, 1000) order(a) [1] 2 1 3 10, which is the second element in a, is the smallest element, so 2 comes first in the output of order(a). 100, which is the first element in a is the second smallest element, so 1 comes second in the output of order(a).

This means we can use the output of order(a) to reshuffle a:

a[order(a)][1] 10 100 1000 INSTRUCTIONS 70 XP INSTRUCTIONS 70 XP Experiment with the order() function in the console. Click ‘Submit Answer’ when you are ready to continue.

Show Answer (-70 XP) HINT Just play with the order() function in the console!

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFBsYXkgYXJvdW5kIHdpdGggdGhlIG9yZGVyIGZ1bmN0aW9uIGluIHRoZSBjb25zb2xlIn0=

Sorting your data frame Alright, now that you understand the order() function, let us do something useful with it. You would like to rearrange your data frame such that it starts with the smallest planet and ends with the largest one. A sort on the diameter column.

INSTRUCTIONS 70 XP Call order() on planets_df\(diameter (the diameter column of planets_df). Store the result as positions. Now reshuffle planets_df with the positions vector as row indexes inside square brackets. Keep all columns. Simply print out the result. Show Answer (-70 XP) HINT Use order(planets_df\)diameter) to create positions. Now, you can use positions inside square brackets: planets_df[…]; can you fill in the …?
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHBsYW5ldHNfZGYgaXMgcHJlLWxvYWRlZCBpbiB5b3VyIHdvcmtzcGFjZVxuXG4jIFVzZSBvcmRlcigpIHRvIGNyZWF0ZSBwb3NpdGlvbnNcbnBvc2l0aW9ucyA8LSBvcmRlcihwbGFuZXRzX2RmJGRpYW1ldGVyKVxuXG4jIFVzZSBwb3NpdGlvbnMgdG8gc29ydCBwbGFuZXRzX2RmXG5wbGFuZXRzX2RmW3Bvc2l0aW9ucywgXSJ9

lists

列表 Lists

Lists, as opposed to vectors, can hold components of different types, just like your to-do list at home or at work. This intro to R chapter will teach you how to create, name and subset these lists.

Lists, why would you need them?

Lists, why would you need them? (2)

Creating a list

Creating a named list

Creating a named list (2)

Selecting elements from a list

Adding more movie information to the list

Lists, why would you need them? Congratulations! At this point in the course you are already familiar with:

Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type. Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type. Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type. Pretty sweet for an R newbie, right? ;-)

INSTRUCTIONS 70 XP Click ‘Submit Answer’ to start learning everything about lists!

Show Answer (-70 XP) HINT Just click the ‘Submit Answer’ button.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEp1c3QgY2xpY2sgdGhlICdTdWJtaXQgQW5zd2VyJyBidXR0b24uIn0=

Lists, why would you need them? (2) A list in R is similar to your to-do list at work or school: the different items on that list most likely differ in length, characteristic, and type of activity that has to be done.

A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.

You could say that a list is some kind super data type: you can store practically any piece of information in it!

INSTRUCTIONS 70 XP Click ‘Submit Answer’ to start the first exercise on lists.

Show Answer (-70 XP) HINT Click ‘Submit Answer’ to start the first exercise on lists.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIENsaWNrICdTdWJtaXQgQW5zd2VyJyB0byBzdGFydCB0aGUgZmlyc3QgZXhlcmNpc2Ugb24gbGlzdHMuIn0=

Creating a list Let us create our first list! To construct a list you use the function list():

my_list <- list(comp1, comp2 …) The arguments to the list function are the list components. Remember, these components can be matrices, vectors, other lists, …

INSTRUCTIONS 70 XP Construct a list, named my_list, that contains the variables my_vector, my_matrix and my_df as list components.

Show Answer (-70 XP) HINT Use the list() function with my_vector, my_matrix and my_df as arguments separated by a comma.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFZlY3RvciB3aXRoIG51bWVyaWNzIGZyb20gMSB1cCB0byAxMFxubXlfdmVjdG9yIDwtIDE6MTAgXG5cbiMgTWF0cml4IHdpdGggbnVtZXJpY3MgZnJvbSAxIHVwIHRvIDlcbm15X21hdHJpeCA8LSBtYXRyaXgoMTo5LCBuY29sID0gMylcblxuIyBGaXJzdCAxMCBlbGVtZW50cyBvZiB0aGUgYnVpbHQtaW4gZGF0YSBmcmFtZSBtdGNhcnNcbm15X2RmIDwtIG10Y2Fyc1sxOjEwLF1cblxuIyBDb25zdHJ1Y3QgbGlzdCB3aXRoIHRoZXNlIGRpZmZlcmVudCBlbGVtZW50czpcbm15X2xpc3QgPC0gbGlzdChteV92ZWN0b3IsIG15X21hdHJpeCwgbXlfZGYpIn0=

Creating a named list Well done, you’re on a roll!

Just like on your to-do list, you want to avoid not knowing or remembering what the components of your list stand for. That is why you should give names to them:

my_list <- list(name1 = your_comp1, name2 = your_comp2) This creates a list with components that are named name1, name2, and so on. If you want to name your lists after you’ve created them, you can use the names() function as you did with vectors. The following commands are fully equivalent to the assignment above:

my_list <- list(your_comp1, your_comp2) names(my_list) <- c(“name1”, “name2”) INSTRUCTIONS 70 XP Change the code of the previous exercise (see editor) by adding names to the components. Use for my_vector the name vec, for my_matrix the name mat and for my_df the name df. Print out my_list so you can inspect the output. Show Answer (-70 XP) HINT The first method of assigning names to your list components is the easiest. It starts like this:

my_list <- list(vec = my_vector) Add the other two components in a similar fashion.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFZlY3RvciB3aXRoIG51bWVyaWNzIGZyb20gMSB1cCB0byAxMFxubXlfdmVjdG9yIDwtIDE6MTAgXG5cbiMgTWF0cml4IHdpdGggbnVtZXJpY3MgZnJvbSAxIHVwIHRvIDlcbm15X21hdHJpeCA8LSBtYXRyaXgoMTo5LCBuY29sID0gMylcblxuIyBGaXJzdCAxMCBlbGVtZW50cyBvZiB0aGUgYnVpbHQtaW4gZGF0YSBmcmFtZSBtdGNhcnNcbm15X2RmIDwtIG10Y2Fyc1sxOjEwLF1cblxuIyBBZGFwdCBsaXN0KCkgY2FsbCB0byBnaXZlIHRoZSBjb21wb25lbnRzIG5hbWVzXG5teV9saXN0IDwtIGxpc3QodmVjID0gbXlfdmVjdG9yLCBtYXQgPSBteV9tYXRyaXgsIGRmID0gbXlfZGYpXG5cbiMgUHJpbnQgb3V0IG15X2xpc3Rcbm15X2xpc3QifQ==

Creating a named list (2) Being a huge movie fan (remember your job at LucasFilms), you decide to start storing information on good movies with the help of lists.

Start by creating a list for the movie “The Shining”. We have already created the variables mov, act and rev in your R workspace. Feel free to check them out in the console.

INSTRUCTIONS 70 XP Complete the code on the right to create shining_list; it contains three elements:

moviename: a character string with the movie title (stored in mov) actors: a vector with the main actors’ names (stored in act) reviews: a data frame that contains some reviews (stored in rev) Do not forget to name the list components accordingly (names are moviename, actors and reviews).

Show Answer (-70 XP) HINT shining_list <- list(moviename = mov) is only part of the solution; it’s your job to also add act and rev to the list, with the appropriate names.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFRoZSB2YXJpYWJsZXMgbW92LCBhY3QgYW5kIHJldiBhcmUgYXZhaWxhYmxlXG5cbiMgRmluaXNoIHRoZSBjb2RlIHRvIGJ1aWxkIHNoaW5pbmdfbGlzdFxuc2hpbmluZ19saXN0IDwtIGxpc3QobW92aWVuYW1lID0gbW92LCBhY3RvcnMgPSBhY3QsIHJldmlld3MgPSByZXYpIn0=

Selecting elements from a list Your list will often be built out of numerous elements and components. Therefore, getting a single element, multiple elements, or a component out of it is not always straightforward.

One way to select a component is using the numbered position of that component. For example, to “grab” the first component of shining_list you type

shining_list[[1]] A quick way to check this out is typing it in the console. Important to remember: to select elements from vectors, you use single square brackets: . Don’t mix them up!

You can also refer to the names of the components, with [] or with the $ sign. Both will select the data frame representing the reviews:

shining_list[[“reviews”]] shining_list$reviews Besides selecting components, you often need to select specific elements out of these components. For example, with shining_list[[2]][1] you select from the second component, actors (shining_list[[2]]), the first element ([1]). When you type this in the console, you will see the answer is Jack Nicholson.

INSTRUCTIONS 70 XP INSTRUCTIONS 70 XP Select from shining_list the vector representing the actors. Simply print out this vector. Select from shining_list the second element in the vector representing the actors. Do a printout like before. Show Answer (-70 XP) HINT To select the vector representing the actors, you can use \(actors. To select the third element in the vector representing the actors, you use shining_list\)actors[3]. What needs to change to select the second element?
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHNoaW5pbmdfbGlzdCBpcyBhbHJlYWR5IHByZS1sb2FkZWQgaW4gdGhlIHdvcmtzcGFjZVxuXG4jIFByaW50IG91dCB0aGUgdmVjdG9yIHJlcHJlc2VudGluZyB0aGUgYWN0b3JzXG5zaGluaW5nX2xpc3QkYWN0b3JzXG5cbiMgUHJpbnQgdGhlIHNlY29uZCBlbGVtZW50IG9mIHRoZSB2ZWN0b3IgcmVwcmVzZW50aW5nIHRoZSBhY3RvcnNcbnNoaW5pbmdfbGlzdCRhY3RvcnNbMl0ifQ==

Adding more movie information to the list Being proud of your first list, you shared it with the members of your movie hobby club. However, one of the senior members, a guy named M. McDowell, noted that you forgot to add the release year. Given your ambitions to become next year’s president of the club, you decide to add this information to the list.

To conveniently add elements to lists you can use the c() function, that you also used to build vectors:

ext_list <- c(my_list , my_val) This will simply extend the original list, my_list, with the component my_val. This component gets appended to the end of the list. If you want to give the new list item a name, you just add the name as you did before:

ext_list <- c(my_list, my_name = my_val) INSTRUCTIONS 70 XP Complete the code below such that an item named year is added to the shining_list with the value 1980. Assign the result to shining_list_full. Finally, have a look at the structure of shining_list_full with the str() function. Show Answer (-70 XP) HINT Have a look at the example code in the exercise assignment. Maybe this can help you start:

shining_list <- c(shining_list, …) You still have to add some code where the three dots are.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHNoaW5pbmdfbGlzdCwgdGhlIGxpc3QgY29udGFpbmluZyBtb3ZpZSBuYW1lLCBhY3RvcnMgYW5kIHJldmlld3MsIGlzIHByZS1sb2FkZWQgaW4gdGhlIHdvcmtzcGFjZVxuXG4jIFVzZSBjKCkgdG8gYWRkIGEgeWVhciB0byBzaGluaW5nX2xpc3RcbnNoaW5pbmdfbGlzdF9mdWxsIDwtIGMoc2hpbmluZ19saXN0LCB5ZWFyID0gMTk4MClcblxuIyBIYXZlIGEgbG9vayBhdCBzaGluaW5nX2xpc3RfZnVsbFxuc3RyKHNoaW5pbmdfbGlzdF9mdWxsKSJ9

load data

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MikiLCJzYW1wbGUiOiIjIEZpZ3VyZSAxXG5nZ3Bsb3QobXRjYXJzLCBhZXMoeD1tcGcsIHk9d3QsIGNvbD1mYWN0b3IoY3lsKSkpK1xuICBnZW9tX3BvaW50KCkifQ==
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MikiLCJzYW1wbGUiOiIjIEZpZ3VyZSAyXG5nZ3Bsb3QobXRjYXJzLCBhZXMoeD1tcGcsIHk9ZHJhdCwgY29sPWZhY3RvcihjeWwpKSkrXG4gIGdlb21fcG9pbnQoKSJ9

Here is an example:

https://www.dropbox.com/s/exh4iobbm2p5p1v/fin_research_note.csv

The file name is at the very end (fin_research_note.csv) and the key is the string of letters and numbers in the middle (exh4iobbm2p5p1v). Now we have all of the information we need for source_DropboxData:

FinDataFull <- repmis::source_DropboxData(“fin_research_note.csv”, “exh4iobbm2p5p1v”, sep = “,”, header = TRUE)

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImFkdmVydGlzaW5nIDwtIHJlYWQuY3N2KCdodHRwczovL3d3dy5kcm9wYm94LmNvbS9zL202amg1a3NwaWFubTIxNS9hZHZlcnRpc2luZy5jc3Y/ZGw9MScpXG5saWJyYXJ5KGRhdGEudGFibGUpXG5saWJyYXJ5KGdncGxvdDIpXG4jYWR2ZXJ0aXNpbmciLCJzYW1wbGUiOiJnZ3Bsb3QoYWR2ZXJ0aXNpbmcsIGFlcyh4PXJhZGlvLCB5PXNhbGVzKSkrXG4gIGdlb21fcG9pbnQoY29sb3VyID0gXCJibHVlXCIsIHNpemUgPSAxLjUpK1xuICBzY2FsZV95X2NvbnRpbnVvdXMobGltaXRzPWMoMCw1MCkpK1xuICBzY2FsZV94X2NvbnRpbnVvdXMobGltaXRzPWMoMCwzMDApKStcbiAgdGhlbWUocGxvdC50aXRsZSA9IGVsZW1lbnRfdGV4dChoanVzdCA9IDAuNSkpICtcbiAgZ2d0aXRsZShcInJhZGlvIGFkIGJ1ZGdldCBhbmQgc2FsZXMgcmVsYXRpb25zaGlwXCIpIn0=