Today’s Lecture

We’ll start using R and R Studio
Complete a simple exercise and simulation to learn R syntax and functionality

NOTES:
  • Before we look at any code: Remember that R is case-sensitive

  • R Code is shown in Grey Boxes and R Output is shown in White Boxes

  • R functions and operators introduced today are listed below.


R Function Description
as.numeric forces values to be treated as numeric
c concatenates
data.frame creates a data frame with input variables
head shows first 6 obs. by default
length outputs how many values are in a vector
rbind row binds or stacks values
rep replicates or repeats specified value or object
row.names outputs the row names of a data set
sample samples a vector
sqrt calculates the square root
sum sums values
summary outputs numerical summary values
tail shows last 6 obs. by default
R Operator Description
<- assign
(..) round parens are used for function inputs
[…] square brackets are used for subsetting
{…} curly brackets are used for loops and functions
%in% finds elements in or belonging to

R is a very fancy calculator.


Feel free to use it this way. The code and output below show:
  • a calculation saved as object x

  • a calculation saved as object y

  • x and y output to the console

  • the sum of x and y output to the console and not saved


x <- 458 + 563 * 298
y <- sqrt(1798)
x
## [1] 168232
y
## [1] 42.40283
x + y
## [1] 168274.4

First Lecture Questions:

Copy and paste the following code.
  1. What is the value of C?

  2. Why does D <- a - B result in an Error?

# Just like in Excel 9^2 is the square of 9 = 81 Type the name of object after
# you create it to print value to console OR Examine object in Global Environment

A <- 9^2
A
## [1] 81
# Just like in Excel 5*5 = 5^2 = 25
B <- 5 * 5
B
## [1] 25
C <- A/B
D <- a - B
## Error in eval(expr, envir, enclos): object 'a' not found

R has many included example data sets

‘cars’ is the name of a dataset in R.
The ‘summary’ command in R outputs numeric summary values of a variable.
The ‘plot’ command plots a scatterplot of the two variables in the data set.
We will make much prettier plots soon

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
plot(cars)


‘pressure’ is another data set in R.
Notice that for the plot below, the R code is hidden in the output document.
We will go over how to create document like this HTML file (or other formats) on Thursday.


Creating a data set that represents a deck of playing cards.

This demo is intended to show you some R coding syntax.
In lecture 2, we will download data from different sources.
For now, let’s create a data set from scratch:
  • Each card in a deck of playing cards has a suit, a face value, and a game value.

  • Game value depends on the game you plaing, e.g. Blackjack, Poker, etc.


First Playing Card Variable: Suit

Notes about the ‘Suit’ variable
  • Suit is text of ‘string’ variable.

  • String values must be in quotes, e.g. “Hello” is a string.

  • Strings can be character or factor variables

  • We will talk about character and factor variables later in the course.

  • The values of this string variable are Clubs, Diamonds, Hearts, Spades.

  • Recall that there are 13 cards of each suit.

  • Below is the R command to create an object named Suit.

  • Our Suit variable includes 13 replicates of each suit.


After we create our object, Suit, we print that to the console to look at it.

# note that this will be printed to the console screen but not saved.
rep("clubs", 13)
##  [1] "clubs" "clubs" "clubs" "clubs" "clubs" "clubs" "clubs" "clubs" "clubs"
## [10] "clubs" "clubs" "clubs" "clubs"

To create an object, like a variable, we assign it a name using <- operator
The c(…) function allows us to concatenate multiple objects, one after another
NOTE: Spaces are not required in R commands. They are included below for clarity

Suit <- c(rep("clubs", 13), rep("diamonds", 13), rep("hearts", 13), rep("spades", 
    13))

# print our new object, the character variable, Suit, to console
Suit
##  [1] "clubs"    "clubs"    "clubs"    "clubs"    "clubs"    "clubs"   
##  [7] "clubs"    "clubs"    "clubs"    "clubs"    "clubs"    "clubs"   
## [13] "clubs"    "diamonds" "diamonds" "diamonds" "diamonds" "diamonds"
## [19] "diamonds" "diamonds" "diamonds" "diamonds" "diamonds" "diamonds"
## [25] "diamonds" "diamonds" "hearts"   "hearts"   "hearts"   "hearts"  
## [31] "hearts"   "hearts"   "hearts"   "hearts"   "hearts"   "hearts"  
## [37] "hearts"   "hearts"   "hearts"   "spades"   "spades"   "spades"  
## [43] "spades"   "spades"   "spades"   "spades"   "spades"   "spades"  
## [49] "spades"   "spades"   "spades"   "spades"

Some Important Notes, particularly for new R users:
  • Again, <- is used to assign a name to an object (just like we did above with A, B, and C)

  • The rep function replicates the value or object specified.

  • The value replicated can be a number, a set of numbers, a text string in quotes, etc.

  • The second entry in the rep function after the comma indicates how many times the object is replicated or repeated.

  • The c(…) function is concatenate. This concatenates or puts one after another all of the objects in parentheses.

  • As shown above, to examine an object we created, we can type it’s name.

  • If the object is huge:

    • You can use the head function to look at the first six observations or
    • You can use the tail function to look at the last six observations

head(Suit)
## [1] "clubs" "clubs" "clubs" "clubs" "clubs" "clubs"
tail(Suit)
## [1] "spades" "spades" "spades" "spades" "spades" "spades"

If you need help working with an R function, type a question mark before the function name, e.g. type ?rep in the console.

Second Playing Card Variable: Face


FaceValues <- c("two", "three", "four", "five", "six", "seven", "eight", "nine", 
    "ten", "jack", "queen", "king", "ace")

Face <- rep(FaceValues, 4)
Face
##  [1] "two"   "three" "four"  "five"  "six"   "seven" "eight" "nine"  "ten"  
## [10] "jack"  "queen" "king"  "ace"   "two"   "three" "four"  "five"  "six"  
## [19] "seven" "eight" "nine"  "ten"   "jack"  "queen" "king"  "ace"   "two"  
## [28] "three" "four"  "five"  "six"   "seven" "eight" "nine"  "ten"   "jack" 
## [37] "queen" "king"  "ace"   "two"   "three" "four"  "five"  "six"   "seven"
## [46] "eight" "nine"  "ten"   "jack"  "queen" "king"  "ace"

Card Values for Playing Blackjack


Value1 <- c(2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 11)

Value2 <- c(2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 1)

# Print Value2
Value2
##  [1]  2  3  4  5  6  7  8  9 10 10 10 10  1
# What if these vectors had a 1000 values and we only wanted to replace a few?
# Value3 is copy of Value1
Value3 <- Value1
Value3[Value3 > 10] <- 1

# Print Value3
Value3
##  [1]  2  3  4  5  6  7  8  9 10 10 10 10  1
# The second line says that all values in Value3 > 10 are set to 1.

# [ ] brackets are used to subset vectors, matrices, data sets, etc.

# more on this in upcoming lectures

Similarly to what we did with ‘Face’ once we have the values, we replicate them 4 times:

GameValues1 <- rep(Value1, 4)

Lecture Question 3

  1. What is the correct R code to create GameValues2 so that Value2 is replicated 4 times?

# Complete the command below GameValues2 <- ?

# Check your work by typing GameValues2 to examine it OR look at it in the Global
# Environment.

**Create a Blackjack Deck and name it GameDeck:####


Lecture Question 4

GameDeck <- data.frame(Suit, Face, GameValues1)

Blackjack in casinos is played with 6 or more decks to make it harder to count cards.

Question for HW 2: What does the rbind function do in the R code below?
  • Hint: If you’re not sure what any command does, you can run it and then examine the output (Which may be too large to view clearly.).

  • OR you can examine the result the Global Evironment pane.

  • OR you can type ?rbind in the R console (works for most functions) and look at the help file.

  • As with anything in R, there are lots of ways to do this.


GameDeck6 <- rbind(GameDeck, GameDeck, GameDeck, GameDeck, GameDeck, GameDeck)

Now we have our Casino Deck, so we could…write the R code to play a game and begin our career as online gambling website…oh wait…that’s been done.
  • If this was strictly a coding class we would complete the game.

  • If you think about all of the possible outcomes, the algorithm takes awhile to code. (I’ve done it.)


Instead, let’s work with (manage) these data and get information

What is the probability that a player will score exactly 21 on the first hand?
  • We will do a simple simulation to answer this question.

  • Simulations can be used to answer hypothetical questions from large data sets.

  • The simulation below is NOT optimized for efficiency

  • We’ll talk about writing more efficient code this semester


First some setup coding:

# create an id variable for sampling from the row numbers as.numeric specifies
# the id values should now be treated as numeric

GameDeck6$Row <- as.numeric(row.names(GameDeck6))
head(GameDeck6)
##    Suit  Face GameValues1 Row
## 1 clubs   two           2   1
## 2 clubs three           3   2
## 3 clubs  four           4   3
## 4 clubs  five           5   4
## 5 clubs   six           6   5
## 6 clubs seven           7   6
# Create an empty storage vector called All_Scores
All_Scores <- NULL

Notes about row names
  • Every data set in R has an index variable, i.e., the row names

  • Row names can be accessed with the row.names function

  • By default, row names are the row numbers unless otherwise specified

  • The code above uses the row numbers to create a new variable named Row

  • $ is used to specify a variable in a data set.

  • We also created an EMPTY storage data set named All_Scores by assigning it to NULL


Now for the mini-simulation:
This ‘for’ loop repeats these steps 100000 times:
  1. Draws two random cards using sample function and saves them as hand2

    • NOTE: In a real casino:
      • The second card is discarded if it is identical to the first card.
      • Today we’re ignoring that, but it can be done with if…then statements
  2. Sums the Game Values for the two drawn cards and saves the sum as Player_Score

  3. Puts the Player_Score in our storage vector, All_Scores, using c function

  4. Updates counter so loop will repeat


for (i in 1:10000) {
    
    hand2 <- sample(GameDeck6$Row, 2, replace = F)
    
    Player_Score <- sum(GameDeck6$GameValues1[GameDeck6$Row %in% hand2])
    
    All_Scores <- c(All_Scores, Player_Score)
    
    i <- 1 + length(All_Scores)
}

Let’s examine our data
First: the quick and ‘not-pretty’ way to plot our scores:

hist(All_Scores)

# We can make it a little better by modifying the title, axis labels, and color:

# main is the title option xlab is the x axis label specifies bar color

hist(All_Scores, main = "Histogram of Blackjack Scores on Hand 1", xlab = "Blackjack Scores", 
    col = "lightgreen")


  • We can easily make this a little prettier AND interactive

  • We will look at other packages and commands but for today,

  • We’ll use a smart command, hchart, within the highcharter package

  • The code shown below installs and loads the highcharter package


Notes about R Packages:
  • R packages only need to be installed once (using install.packages)

  • If you reinstall R (to update it), you must reinstall packages.

  • R packages need to be loaded every time you start a new R session

  • In future lectures, there will be a list of packages used at top of document.

  • Installing, loading, using packages may work slightly differently on a MAC.

  • When something doesn’t work, let me or a TA know.

  • We will help you interpret error messages, which are sometimes unclear.


# install.packages('highcharter')
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
hchart(hist(All_Scores), color = "lightgreen")


Using hchart function as a ‘wrapper’ for our histogram, makes it interactive, but customizing it has a learning curve.
  • We will learn other interactive options

Now for a simple question:
  • If we play Blackjack, What is the probability that we will score exactly 21 in our first two cards?

  • To answer this we can use the length function and the [ ] brackets to subset our data.

  • length tells us how many observations are in a vector

  • We KNOW the length (number of obs.) of our simulated data, All-Scores, is 10000


length(All_Scores)
## [1] 10000

  • Now we ALSO want to know how many of our scores were EXACTLY 21.

  • What is the length (number of obs.) in the subset of our All_Scores vector of all the observations that are 21?


# == means 'equal' to in a subset

length(All_Scores[All_Scores == 21])
## [1] 484

Using these values to answer our Question
  • Dividing that subset length by the length of the whole vector gives us the probability.

  • The code below also shows how to round a calculated value using the round function.

  • The number after the command in the round function indicates number of decimal places.

  • Note, This could also be done with one line


# Divide number of values that are 21 by total number of values
prob21 <- length(All_Scores[All_Scores == 21])/length(All_Scores)

# round value prob21 to have 2 decimal places
round(prob21, 2)
## [1] 0.05
# OR (not saved as an object)
round(length(All_Scores[All_Scores == 21])/length(All_Scores), 2)
## [1] 0.05

In HW 2, you will be using an All_Scores variable like this to find the probability that a Dealer scores 17 or more on the first hand and has to ‘Stick’.

End of Lecture 1