Before we look at any code: Remember that R is case-sensitive
R Code is shown in Grey Boxes and R Output is shown in White Boxes
R functions and operators introduced today are listed below.
R Function | Description |
---|---|
as.numeric | forces values to be treated as numeric |
c | concatenates |
data.frame | creates a data frame with input variables |
head | shows first 6 obs. by default |
length | outputs how many values are in a vector |
rbind | row binds or stacks values |
rep | replicates or repeats specified value or object |
row.names | outputs the row names of a data set |
sample | samples a vector |
sqrt | calculates the square root |
sum | sums values |
summary | outputs numerical summary values |
tail | shows last 6 obs. by default |
R Operator | Description |
---|---|
<- | assign |
(..) | round parens are used for function inputs |
[…] | square brackets are used for subsetting |
{…} | curly brackets are used for loops and functions |
%in% | finds elements in or belonging to |
a calculation saved as object x
a calculation saved as object y
x and y output to the console
the sum of x and y output to the console and not saved
x <- 458 + 563 * 298
y <- sqrt(1798)
x
## [1] 168232
y
## [1] 42.40283
x + y
## [1] 168274.4
What is the value of C?
Why does D <- a - B result in an Error?
# Just like in Excel 9^2 is the square of 9 = 81 Type the name of object after
# you create it to print value to console OR Examine object in Global Environment
A <- 9^2
A
## [1] 81
# Just like in Excel 5*5 = 5^2 = 25
B <- 5 * 5
B
## [1] 25
C <- A/B
D <- a - B
## Error in eval(expr, envir, enclos): object 'a' not found
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
plot(cars)
Each card in a deck of playing cards has a suit, a face value, and a game value.
Game value depends on the game you plaing, e.g. Blackjack, Poker, etc.
Suit is text of ‘string’ variable.
String values must be in quotes, e.g. “Hello” is a string.
Strings can be character or factor variables
We will talk about character and factor variables later in the course.
The values of this string variable are Clubs, Diamonds, Hearts, Spades.
Recall that there are 13 cards of each suit.
Below is the R command to create an object named Suit.
Our Suit variable includes 13 replicates of each suit.
# note that this will be printed to the console screen but not saved.
rep("clubs", 13)
## [1] "clubs" "clubs" "clubs" "clubs" "clubs" "clubs" "clubs" "clubs" "clubs"
## [10] "clubs" "clubs" "clubs" "clubs"
Suit <- c(rep("clubs", 13), rep("diamonds", 13), rep("hearts", 13), rep("spades",
13))
# print our new object, the character variable, Suit, to console
Suit
## [1] "clubs" "clubs" "clubs" "clubs" "clubs" "clubs"
## [7] "clubs" "clubs" "clubs" "clubs" "clubs" "clubs"
## [13] "clubs" "diamonds" "diamonds" "diamonds" "diamonds" "diamonds"
## [19] "diamonds" "diamonds" "diamonds" "diamonds" "diamonds" "diamonds"
## [25] "diamonds" "diamonds" "hearts" "hearts" "hearts" "hearts"
## [31] "hearts" "hearts" "hearts" "hearts" "hearts" "hearts"
## [37] "hearts" "hearts" "hearts" "spades" "spades" "spades"
## [43] "spades" "spades" "spades" "spades" "spades" "spades"
## [49] "spades" "spades" "spades" "spades"
Again, <- is used to assign a name to an object (just like we did above with A, B, and C)
The rep function replicates the value or object specified.
The value replicated can be a number, a set of numbers, a text string in quotes, etc.
The second entry in the rep function after the comma indicates how many times the object is replicated or repeated.
The c(…) function is concatenate. This concatenates or puts one after another all of the objects in parentheses.
As shown above, to examine an object we created, we can type it’s name.
If the object is huge:
head(Suit)
## [1] "clubs" "clubs" "clubs" "clubs" "clubs" "clubs"
tail(Suit)
## [1] "spades" "spades" "spades" "spades" "spades" "spades"
The object ‘FaceValues’ creates the vector of card face values, another string variable.
The object ‘Face’ repeats that vector ‘Facevalues’ 4 times, once for each suit.
Then we type Face to examine the vector
FaceValues <- c("two", "three", "four", "five", "six", "seven", "eight", "nine",
"ten", "jack", "queen", "king", "ace")
Face <- rep(FaceValues, 4)
Face
## [1] "two" "three" "four" "five" "six" "seven" "eight" "nine" "ten"
## [10] "jack" "queen" "king" "ace" "two" "three" "four" "five" "six"
## [19] "seven" "eight" "nine" "ten" "jack" "queen" "king" "ace" "two"
## [28] "three" "four" "five" "six" "seven" "eight" "nine" "ten" "jack"
## [37] "queen" "king" "ace" "two" "three" "four" "five" "six" "seven"
## [46] "eight" "nine" "ten" "jack" "queen" "king" "ace"
Value1 and Value2 are created because an Ace can either be worth 1 or 11 points.
There are many ways to achieve this, but this is one option.
Notice the only difference between Value1 and Value2 is that 11 becomes 1
Another (more advanced but not necessary) way to code this is also shown as Value3
Value1 <- c(2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 11)
Value2 <- c(2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 1)
# Print Value2
Value2
## [1] 2 3 4 5 6 7 8 9 10 10 10 10 1
# What if these vectors had a 1000 values and we only wanted to replace a few?
# Value3 is copy of Value1
Value3 <- Value1
Value3[Value3 > 10] <- 1
# Print Value3
Value3
## [1] 2 3 4 5 6 7 8 9 10 10 10 10 1
# The second line says that all values in Value3 > 10 are set to 1.
# [ ] brackets are used to subset vectors, matrices, data sets, etc.
# more on this in upcoming lectures
GameValues1 <- rep(Value1, 4)
Hint: Use GameValue1 as an example.
Note: In order for the code to run, you must remove # which makes it a comment.
# Complete the command below GameValues2 <- ?
# Check your work by typing GameValues2 to examine it OR look at it in the Global
# Environment.
Notice that GameDeck only includes GameValues1
How would you change the data.frame command below to add the variable we created, GameValues2?
GameDeck <- data.frame(Suit, Face, GameValues1)
Hint: If you’re not sure what any command does, you can run it and then examine the output (Which may be too large to view clearly.).
OR you can examine the result the Global Evironment pane.
OR you can type ?rbind in the R console (works for most functions) and look at the help file.
As with anything in R, there are lots of ways to do this.
GameDeck6 <- rbind(GameDeck, GameDeck, GameDeck, GameDeck, GameDeck, GameDeck)
If this was strictly a coding class we would complete the game.
If you think about all of the possible outcomes, the algorithm takes awhile to code. (I’ve done it.)
We will do a simple simulation to answer this question.
Simulations can be used to answer hypothetical questions from large data sets.
The simulation below is NOT optimized for efficiency
We’ll talk about writing more efficient code this semester
# create an id variable for sampling from the row numbers as.numeric specifies
# the id values should now be treated as numeric
GameDeck6$Row <- as.numeric(row.names(GameDeck6))
head(GameDeck6)
## Suit Face GameValues1 Row
## 1 clubs two 2 1
## 2 clubs three 3 2
## 3 clubs four 4 3
## 4 clubs five 5 4
## 5 clubs six 6 5
## 6 clubs seven 7 6
# Create an empty storage vector called All_Scores
All_Scores <- NULL
Every data set in R has an index variable, i.e., the row names
Row names can be accessed with the row.names function
By default, row names are the row numbers unless otherwise specified
The code above uses the row numbers to create a new variable named Row
$ is used to specify a variable in a data set.
We also created an EMPTY storage data set named All_Scores by assigning it to NULL
Draws two random cards using sample function and saves them as hand2
Sums the Game Values for the two drawn cards and saves the sum as Player_Score
Puts the Player_Score in our storage vector, All_Scores, using c function
Updates counter so loop will repeat
for (i in 1:10000) {
hand2 <- sample(GameDeck6$Row, 2, replace = F)
Player_Score <- sum(GameDeck6$GameValues1[GameDeck6$Row %in% hand2])
All_Scores <- c(All_Scores, Player_Score)
i <- 1 + length(All_Scores)
}
hist(All_Scores)
# We can make it a little better by modifying the title, axis labels, and color:
# main is the title option xlab is the x axis label specifies bar color
hist(All_Scores, main = "Histogram of Blackjack Scores on Hand 1", xlab = "Blackjack Scores",
col = "lightgreen")
We can easily make this a little prettier AND interactive
We will look at other packages and commands but for today,
We’ll use a smart command, hchart, within the highcharter package
The code shown below installs and loads the highcharter package
R packages only need to be installed once (using install.packages)
If you reinstall R (to update it), you must reinstall packages.
R packages need to be loaded every time you start a new R session
In future lectures, there will be a list of packages used at top of document.
Installing, loading, using packages may work slightly differently on a MAC.
When something doesn’t work, let me or a TA know.
We will help you interpret error messages, which are sometimes unclear.
# install.packages('highcharter')
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
hchart(hist(All_Scores), color = "lightgreen")
If we play Blackjack, What is the probability that we will score exactly 21 in our first two cards?
To answer this we can use the length function and the [ ] brackets to subset our data.
length tells us how many observations are in a vector
We KNOW the length (number of obs.) of our simulated data, All-Scores, is 10000
length(All_Scores)
## [1] 10000
Now we ALSO want to know how many of our scores were EXACTLY 21.
What is the length (number of obs.) in the subset of our All_Scores vector of all the observations that are 21?
# == means 'equal' to in a subset
length(All_Scores[All_Scores == 21])
## [1] 484
Dividing that subset length by the length of the whole vector gives us the probability.
The code below also shows how to round a calculated value using the round function.
The number after the command in the round function indicates number of decimal places.
Note, This could also be done with one line
# Divide number of values that are 21 by total number of values
prob21 <- length(All_Scores[All_Scores == 21])/length(All_Scores)
# round value prob21 to have 2 decimal places
round(prob21, 2)
## [1] 0.05
# OR (not saved as an object)
round(length(All_Scores[All_Scores == 21])/length(All_Scores), 2)
## [1] 0.05