R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Exercise 1

library(readxl)
courage <- read.csv("courage.csv")
scores <- select(courage, home_pts, away_pts)
slice(scores, 1:5)
##   home_pts away_pts
## 1        0        1
## 2        1        0
## 3        3        1
## 4        0        1
## 5        3        1

Exercise 2 The mean number of goals was 2.858974.

courage <- mutate(courage, total_goals = home_pts + away_pts)
mean(courage$total_goals)
## [1] 2.858974

Exercise 3 A streak length of 2 means that there were two hits, then one miss. A streak length of 0 means that there were 0 hits and 1 miss.

source("https://kelrenmor.github.io/STA101-002/labs/functions/calc_streak.R") 
courage_streak <- data.frame(length=calc_streak(courage$winner))
ggplot(courage_streak, aes(x=length)) + 
  geom_bar()

Exercise 4 The distribution of the Courage’s winning streak length data from 2017-2019 seems to have a right skew. The typical streak length was 1 streak. There are not any distinct outliers. The longest streak of wins was 6 wins.

ggplot(courage_streak, aes(x=length)) + 
  geom_bar()

Exercise 5

set.seed(032602)
die_outcomes<-c("1", "2", "3", "4", "5", "6")
sample(1:6,size=100, replace = TRUE, prob=c(.1, .1, .1, .1, .1, .5))
##   [1] 6 6 6 6 5 6 4 2 4 5 6 5 6 6 6 6 3 5 6 6 6 6 5 4 6 3 6 6 3 6 1 1 6 4 1 6 6
##  [38] 5 4 6 6 5 5 1 6 6 2 6 6 6 3 1 6 2 6 2 6 5 1 6 3 6 1 6 6 4 6 2 5 6 1 4 6 1
##  [75] 6 6 4 6 6 4 6 5 5 6 6 6 2 6 1 6 5 3 6 6 6 6 6 4 4 1
sim_die_roll<-sample(1:6,size=100, replace = TRUE, prob=c(.1, .1, .1, .1, .1, .5))
sim_die_roll
##   [1] 6 2 6 5 5 3 6 4 2 6 4 1 2 6 4 6 2 2 6 5 6 6 4 6 3 6 4 2 2 1 4 6 4 6 6 6 6
##  [38] 6 6 2 6 6 6 6 6 6 6 3 5 6 4 6 6 6 6 5 6 6 1 3 4 1 6 4 2 6 4 6 2 6 5 6 1 6
##  [75] 6 2 4 6 2 3 2 6 3 6 3 6 6 6 3 5 6 5 6 6 5 1 1 4 6 6
table(sim_die_roll)
## sim_die_roll
##  1  2  3  4  5  6 
##  7 13  8 13  9 50

Exercise 6

set.seed(032602)
game_outcomes <-c("H","M")
sim_games <- sample(game_outcomes, size = 78, replace = TRUE, prob=c(.68, .32))
sim_games
##  [1] "H" "H" "H" "H" "M" "H" "H" "M" "M" "M" "H" "M" "H" "H" "H" "H" "H" "M" "H"
## [20] "H" "H" "H" "M" "H" "H" "H" "H" "H" "H" "H" "M" "M" "H" "M" "M" "H" "H" "M"
## [39] "H" "H" "H" "M" "M" "M" "H" "H" "M" "H" "H" "H" "H" "M" "H" "M" "H" "M" "H"
## [58] "M" "M" "H" "H" "H" "M" "H" "H" "H" "H" "M" "M" "H" "M" "H" "H" "M" "H" "H"
## [77] "H" "H"
table (sim_games)
## sim_games
##  H  M 
## 52 26

Exercise 7

source("https://kelrenmor.github.io/STA101-002/labs/functions/calc_streak.R") 
sim_streak<-data.frame(length=calc_streak(sim_games))
sim_streak
##    length
## 1       4
## 2       2
## 3       0
## 4       0
## 5       1
## 6       5
## 7       4
## 8       7
## 9       0
## 10      1
## 11      0
## 12      2
## 13      3
## 14      0
## 15      0
## 16      2
## 17      4
## 18      1
## 19      1
## 20      1
## 21      0
## 22      3
## 23      4
## 24      0
## 25      1
## 26      2
## 27      4

Exercise 8 The streak lengths distributions has a right skew. The typical streak length is 0. The longest streak length is 7.

ggplot(sim_streak, aes(x=length)) + geom_bar()

Exercise 9

The distribution of Courage’s streak lengths and the simulated team’s streak lengths are not similar. Both graphs are skewed right but the distribution of streak lengths are not close enough to be similar. This simulation doesn’t give evidence that the “streaking team” model fits the Courage’s winning pattern.