This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
plot(cars)
mean(cars$speed)
## [1] 15.4
mean(cars$dist)
## [1] 42.98
max(cars$dist)
## [1] 120
max(cars$speed)
## [1] 25
4+1
## [1] 5
5-2
## [1] 3
2^2
## [1] 4
sqrt(25)
## [1] 5
2^5
## [1] 32
# Natural log of 2
log(2.72)
## [1] 1.000632
log10(5)
## [1] 0.69897
log10(10)
## [1] 1
log10(100)
## [1] 2
# Computing log base 10
log(10, base=5)
## [1] 1.430677
log(10, base=2)
## [1] 3.321928
log(1000, base=10)
## [1] 3
Question_1: Compute the log base 5 of 10 and the log of 10.
log(10,base=5)
## [1] 1.430677
log(10,base=10)
## [1] 1
Computing some offensive metrics in Baseball
BA=29/112
BA
## [1] 0.2589286
Batting_Average=round(BA,digits=3)
Batting_Average
## [1] 0.259
Question_2: What is the batting average of a player that bats 42 hits in 212 at bats?
BA=42/212
Batting_Average=round(BA,digits=3)
Batting_Average
## [1] 0.198
Question_3: Compute the OBP for a player with the following general stats: AB=565,H=156,BB=65,HBP=3,SF=7
#On Base Percentage
#OBP=(H+BB+HBP)/(At Bats+H+BB+HBP+SF)
#Let us compute the OBP for a player with the following general stats
#AB=565,H=156,BB=156,HBP=3,SF=7
OBP=(156+65+3)/(565+156+65+3+7)
OBP
## [1] 0.281407
Test whether something is less than, greater than or equal to something
3 == 8# Does 3 equals 8?
## [1] FALSE
3 != 8# Is 3 different from 8?
## [1] TRUE
3 <= 8# Is 3 less than or equal to 8?
## [1] TRUE
3>4
## [1] FALSE
The logical operators are & for logical AND, | for logical OR, and ! for NOT. These are some examples:
# Logical Disjunction (or)
FALSE | FALSE # False OR False
## [1] FALSE
# Logical Conjunction (and)
TRUE & FALSE #True AND False
## [1] FALSE
# Negation
! FALSE # Not False
## [1] TRUE
# Combination of statements
2 < 3 | 1 == 5 # 2<3 is True, 1==5 is False, True OR False is True
## [1] TRUE
Assigning Values to Variables In R, create a variable and assign it a value using <- as follows
Total_Bases <- 6 + 5
Total_Bases*3
## [1] 33
To see the variables that are currently defined, use ls (as in “list”)
ls()
## [1] "BA" "Batting_Average" "OBP" "Total_Bases"
To delete a variable, use rm (as in “remove”)
rm(Total_Bases)
Vectors The basic type of object in R is a vector, which is an ordered list of values of the same type. You can create a vector using the c() function (as in “concatenate”).
pitches_by_innings <- c(12, 15, 10, 20, 10)
pitches_by_innings
## [1] 12 15 10 20 10
testing_vector <- c(54, 27, 17, 9, 7)
testing_vector
## [1] 54 27 17 9 7
strikes_by_innings <- c(9, 12, 6, 14, 9)
strikes_by_innings
## [1] 9 12 6 14 9
Question_4: Define two vectors,runs_per_9innings and hits_per_9innings, each with five elements.
runs_per_9innings <- c(15, 13, 10, 9, 18)
hits_per_9innings <- c(12, 10, 15, 9, 13)
runs_per_9innings
## [1] 15 13 10 9 18
hits_per_9innings
## [1] 12 10 15 9 13
Functions that will create vectors with regular patterns, like repeated elements.
# replicate function
rep(2, 5)
## [1] 2 2 2 2 2
rep(1,4)
## [1] 1 1 1 1
# consecutive numbers
1:5
## [1] 1 2 3 4 5
2:10
## [1] 2 3 4 5 6 7 8 9 10
# sequence from 1 to 10 with a step of 2
seq(1, 10, by=2)
## [1] 1 3 5 7 9
seq(2,13,by=3)
## [1] 2 5 8 11
Many functions and operators like + or - will work on all elements of the vector.
# add vectors
pitches_by_innings+strikes_by_innings
## [1] 21 27 16 34 19
# compare vectors
pitches_by_innings == strikes_by_innings
## [1] FALSE FALSE FALSE FALSE FALSE
# find length of vector
length(pitches_by_innings)
## [1] 5
# find minimum value in vector
min(pitches_by_innings)
## [1] 10
# find average value in vector
mean(pitches_by_innings)
## [1] 13.4
pitches_by_innings
## [1] 12 15 10 20 10
# If you want to get the first element:
pitches_by_innings[1]
## [1] 12
#Question_5: Get the first element of hits_per_9innings.
hits_per_9innings[1]
## [1] 12
If you want to get the last element of pitches_by_innings without explicitly typing the number of elements of pitches_by_innings, make use of the length function, which calculates the length of a vector:
pitches_by_innings[length(pitches_by_innings)]
## [1] 10
#Question_6: Get the last element of hits_per_9innings.
hits_per_9innings[length(hits_per_9innings)]
## [1] 13
You can also extract multiple values from a vector. For instance to get the 2nd through 4th values use
pitches_by_innings[c(2, 3, 4)]
## [1] 15 10 20
Vectors can also be strings or logical values
player_positions <- c("catcher", "pitcher", "infielders", "outfielders")
player_positions
## [1] "catcher" "pitcher" "infielders" "outfielders"
Data Frames In statistical applications, data is often stored as a data frame, which is like a spreadsheet, with rows as observations and columns as variables.
To manually create a data frame, use the data.frame() function.
data.frame(bonus = c(2, 3, 1),#in millions
active_roster = c("yes", "no", "yes"),
salary = c(1.5, 2.5, 1))#in millions
How to Make a Random Sample To randomly select a sample use the function sample(). The following code selects 5 numbers between 1 and 10 at random (without duplication)
sample(1:10, size=5)
## [1] 9 8 1 3 5
Create a dataframe with first 10 letters and number them
bar <- data.frame(var1 = LETTERS[1:10], var2 = 1:10)
# Check data frame
bar
Suppose you want to select a random sample of size 5. First, define a variable n with the size of the sample, i.e. 5
n <- 5
Now, select a sample of size 5 from the vector with 1 to 10 (the number of rows in bar). Use the function nrow() to find the number of rows in bar instead of manually entering that number.
Use : to create a vector with all the integers between 1 and the number of rows in bar.
samplerows <- sample(1:nrow(bar), size=n)
# print sample rows
samplerows
## [1] 10 6 3 7 8
The variable samplerows contains the rows of bar which make a random sample from all the rows in bar.
# extract rows
barsample <- bar[samplerows, ]
# print sample
print(barsample)
## var1 var2
## 10 J 10
## 6 F 6
## 3 C 3
## 7 G 7
## 8 H 8
bar[sample(1:nrow(bar), n), ]
Using Tables The table() command allows us to look at tables. Its simplest usage looks like table(x) where x is a categorical variable.
x <- c("Yes","No","No","Yes","Yes")
table(x)
## x
## No Yes
## 2 3
Numerical measures of center and spread
sals <- c(12, .4, 5, 2, 50, 8, 3, 1, 4, 0.25)
# the average
mean(sals)
## [1] 8.565
# the variance
var(sals)
## [1] 225.5145
# the standard deviation
sd(sals)
## [1] 15.01714
# the median
median(sals)
## [1] 3.5
# Tukey's five number summary, usefull for boxplots
# five numbers: min, lower hinge, median, upper hinge, max
fivenum(sals)
## [1] 0.25 1.00 3.50 8.00 50.00
# summary statistics
summary(sals)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.250 1.250 3.500 8.565 7.250 50.000
Mode function In R we can write our own functions, and a first example of a function is shown below in order to compute the mode of a vector of observations x
# Function to find the mode, i.e. most frequent value
getMode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
# Most frequent value in pitches_by_innings
getMode(pitches_by_innings)
## [1] 10
#Question_7: Find the most frequent value of hits_per_9innings.
getMode(hits_per_9innings)
## [1] 12
#Question_8: Summarize the following survey with the
table() command: #What is your favorite day of the
week to watch baseball? A total of 10 fans submitted this survey.
#Saturday, Saturday, Sunday, Monday, Saturday,Tuesday, Sunday, Friday,
Friday, Monday
game_day<-c("Saturday", "Saturday", "Sunday", "Monday", "Saturday","Tuesday", "Sunday", "Friday", "Friday", "Monday")
table(game_day)
## game_day
## Friday Monday Saturday Sunday Tuesday
## 2 2 3 2 1
#Question_9: What is the most frequent answer recorded in the survey? Use the getMode function to compute results.
getMode(game_day)
## [1] "Saturday"