Try to work through this Problem Set 0. You will have time to discuss questions you have with the TAs.

Basic R code, R as a calculator

The simplest thing we can do is to use R as a calculator. For instance, we can type 2 + 3 * (6 - 4)^5 after the > and then hit Enter. Below, is a code-chunk showing the output of typing 2 + 3 * (6 - 4)^5 after the > and then hitting Enter. Throughout the course, you should try to replicate the code and output in these code chunks.

We see that R returns the value of the expression \(2 + 3 \times (6 - 4)^5\) immediately underneath and the answer is preceded by a [1]. For now, don’t worry about this ``[1]’’; we’ll return to it later.

2 + 3 * (6 - 4)^5
## [1] 98

Like any scientific calculator, R comes with a number of built-in functions

sqrt(4)
## [1] 2
log(10)
## [1] 2.302585
cos(pi)
## [1] -1

Calculating baseball winning percentages

In Statistics Lecture 1, Prof Wyner will discuss the season record of Major League Baseball’s Oakland A’s. Each season (well each complete season) a MLB team plays 162 regular season games. In the 2022 MLB World Series, the American League’s highest seeded team the Houston Astros with a season record of 106 wins and 52 loses played the Philadelphia Phillies, the National League team that slipped into the playoff in the 6th seed (87 wins adn 75 loses)

Let’s calculate the season winning percentages of these two teams

106/162
## [1] 0.654321
87/162
## [1] 0.537037

Close, but winning percentages are usually reported as number between 0 - 100. So let’s multiple by 100,

106/162 *100
## [1] 65.4321
87/162 *100
## [1] 53.7037

Now let’s calculate the difference in the two teams winning percentages. We can do this in a few ways.

106/162 * 100 - 87/162 * 100
## [1] 11.7284
(106/162* 100) - (87/162* 100)
## [1] 11.7284

Note that we don’t need the ‘( )’ around the division and multiplication operation. The division and multiplication is calculated 1st followed by the subtraction


Assignment

When you enter expressions like those above, R evaluates them, prints them, and then immediately discards them. Oftentimes, however, you’ll want to store a value as a named variable and use it in subsequent calculations. For instance, let’s say that we want to store the value of \(2 + 3\times (6 - 4)^5\) as \(x\) and then compute the following: \(1/x, x + 1,\) and \(\sqrt{x}.\) To assign the value of the expression 2 + 3*(6 - 4)^5 to the variable x we use the assignment operator <-. The assignment operator evaluates the expression immediately to the right of it and stores that value in an object whose name is whatever text came to the left of the operator.

x <- 2 + 3*(6-4)^5

Now when we execute this expression, R does not auotmatically print anything like it did in an earlier example. However, if you look closely at the Environment pane (top right-hand side of the RStudio window), you’ll see that it now lists x and its value 98.

This pane will show every variable that we have defined. As we start creating more and more variables, this list will be really helpful to keep track of what we’ve defined. Now that we have created the variable x, we can use the symbol ``x’’ in more expressions. For example, we can compute \(1/x, x + 1,\) and \(\sqrt{x}\) as follows:

x
## [1] 98
1/x
## [1] 0.01020408
x + 1
## [1] 99
sqrt(x)
## [1] 9.899495
round(3.14159, digits = 2)
## [1] 3.14
round(3.14159, digits = 4)
## [1] 3.1416
round(sqrt(x), digits = 4)
## [1] 9.8995

Assignment in baseball winning percentages

Let’s return to our two 2022 World Series teams. Since all teams play the same total number of regular season games, it would be convenient to define a variable, total_games, and assing to it the total number of games, 162.

total_games <- 162

106/total_games
## [1] 0.654321
87/total_games
## [1] 0.537037

Let’s also define variables for games won by the Astros and Phillies.

wins_astros <- 106

wins_phillies <- 87

Take a look up in the environment and observe these variables that we have created and assigned values to.

Let’s calcuate our two winning percentages one more time, for the Astros and then the Phillies.

wins_astros/total_games * 100
## [1] 65.4321
wins_phillies/total_games * 100
## [1] 53.7037

And then let’s round these to 1 digit.

round(wins_astros/total_games * 100, 1)
## [1] 65.4
round(wins_phillies/total_games * 100, 1)
## [1] 53.7

In the last three examples, we used the function round(), which takes 2 arguments (entered inside the parantheses). The first argument is the number that we want to round and the second argument (following the comma) is the number of digits to which we want to round the first argument. This is our first example of a multi-argument function and we’ll be seeing a lot more of them later on. In the last example above, we didn’t give round an explicit number to round. Instead, R first evaluated sqrt(x) and then rounded it. R follows the conventional order of operations, in that it evaluates the inner-most expression first and then works its way out.

If we try to evaluate an expression using the name of a variable that we haven’t defined yet, R will throw an error. For instance, in the code below, if we try to add 5 to a previously undeclared variable, we get an error.

y + 5
## Error in eval(expr, envir, enclos): object 'y' not found

Exercises

  1. What does R return if you try and do a calculation with a variable that you have not yet assigned a value? At the prompt write: 1 + a

  2. Now assign the value of 2 to a, like this: a<- 1 And now add 1 to a. What does R return?

  3. In R, what does the code below return? (make sure to use all caps for ‘TRUE’ 1 + TRUE

  4. In R, what does the code below return? (make sure to use all caps for ‘FALSE’) 1 + FALSE

  5. Save the value of 8/5 + 3^3 as a new variable y.

  6. Save the values of y/x as a new variable z.

  7. Compute the square root of x, y and z.

  8. Round these values to 3 decimal points.