title: “WPA#3” |
author: “Rebekka Herz” |
date: “28. April 2015” |
output: html_document |
first.five <- letters[1:5]
first.five
## [1] "a" "b" "c" "d" "e"
length(letters)
## [1] 26
last.five <- letters[21:26]
last.five
## [1] "u" "v" "w" "x" "y" "z"
letters[seq(2, 26, 2)]
## [1] "b" "d" "f" "h" "j" "l" "n" "p" "r" "t" "v" "x" "z"
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
which(letters == "t")
## [1] 20
which(letters == "a")
## [1] 1
which(letters == "e")
## [1] 5
which(letters == "i")
## [1] 9
which(letters == "o")
## [1] 15
which(letters == "u")
## [1] 21
Last night 10 pirates had a contest to see whose parrot could eat the most seeds in a minute. I stored the values below in a vector called seeds and the names of the pirates in a vector called pirates
seeds <- c(6, 10, 243, 12, 43, 20, 34, 18, 24, 20)
pirates <- c("Emmanuel", "Alissa", "Lucia", "Marcel", "Florian", "Nadiia", "Yvonne", "Florina", "Xu", "Zoe")
which(seeds > 30)
## [1] 3 5 7
hungry <- seeds > 30
hungry
## [1] FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
mean(hungry)
## [1] 0.3
hungrier <- seeds < 40
hungrier
## [1] TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
sd(hungrier)
## [1] 0.421637
min(seeds)
## [1] 6
sort(seeds, decreasing = T)
## [1] 243 43 34 24 20 20 18 12 10 6
which(seeds == 243)
## [1] 3
pirates[3]
## [1] "Lucia"
pirates[which(seeds == 243)]
## [1] "Lucia"
# Lucia is the name of the Pirate whose parrot ate the most seeds
pirates[which(seeds == 6)]
## [1] "Emmanuel"
# Emmanuel is the name of the Pirate whose parrot ate the fewest seeds
evenhungrier <- seeds > 50
evenhungrier
## [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
which(evenhungrier)
## [1] 3
pirates[which(evenhungrier)]
## [1] "Lucia"
# Lucia was the only pirate whose parrot ate more than 50 seeds.
median(seeds)
## [1] 20
#there are no outliers, so I'm guessing we can just use the regular median function?
Last weekend I asked five pirates what their top three favorite tv shows are. Here are the results:
katharina <- c("Breaking Bad", "Game of Thrones", "The Simpsons")
tani <- c("Orange is the New Black", "Game of Thrones", "House of Cards")
alexander <- c("The Walking Dead", "Game of Thrones", "True Detective")
sarah <- c("Mad Men", "My Little Pony", "House of Cards")
rebekka <- c("Game of Thrones", "Gotham", "Breaking Bad")
responses <- c(katharina, tani, alexander, sarah, rebekka)
responses
## [1] "Breaking Bad" "Game of Thrones"
## [3] "The Simpsons" "Orange is the New Black"
## [5] "Game of Thrones" "House of Cards"
## [7] "The Walking Dead" "Game of Thrones"
## [9] "True Detective" "Mad Men"
## [11] "My Little Pony" "House of Cards"
## [13] "Game of Thrones" "Gotham"
## [15] "Breaking Bad"
favs <- unique(responses)
favs
## [1] "Breaking Bad" "Game of Thrones"
## [3] "The Simpsons" "Orange is the New Black"
## [5] "House of Cards" "The Walking Dead"
## [7] "True Detective" "Mad Men"
## [9] "My Little Pony" "Gotham"
table(responses)
## responses
## Breaking Bad Game of Thrones Gotham
## 2 4 1
## House of Cards Mad Men My Little Pony
## 2 1 1
## Orange is the New Black The Simpsons The Walking Dead
## 1 1 1
## True Detective
## 1
# 1.) using the %in% function
"Orange is the New Black" %in% favs
## [1] TRUE
# Yes, somebody must have chosen the above mentioned movie.
# 2.) using sum() and the logical ==
sum(responses == "Orange is the New Black")
## [1] 1
# This means that one person chose "Orange is the New Black" as their favourite movie.
sum(responses == "House of Cards")
## [1] 2
mean(c(responses == "House of Cards"))
## [1] 0.1333333
#13,3 %
# 2 out of 5 people like the movie "House of Cards" so it can't be 13,3% - but how to prove it?
intersect(rebekka, tani)
## [1] "Game of Thrones"
#Both rebekka and tani like "Game of Thrones"
setdiff(alexander, katharina)
## [1] "The Walking Dead" "True Detective"
#[1] "The Walking Dead" "True Detective"
"The Walking Dead" %in% alexander
## [1] TRUE
"The Walking Dead" %in% katharina
## [1] FALSE
"True Detective" %in% katharina
## [1] FALSE
# Since The Walking Dead and True Detective are not among the movies that Katharina has mentioned, Alexander might recommend it to Katharina.
The “Birthday Problem” is a famous statistics riddle. The question is: How many people do you need to have in a room, for the probability to be greater than 0.50 that at least two people in the room have the same birthday? For example, if there are just 2 people, the probability is quite low, while if there are 366 people, the probability is 1.0. How many people do you need for the probability to be just a bit larger than 0.50? Later on in the course we’ll solve this question, but for now, let’s test a small example of 30 people. Find a way to simulate a single room (Hint: use sample()) with 30 people. Now find a way to test if any two people in the room have the same birthday (Hint: Use a combination of unique() and length(), or the function duplicated()). Try running your code a few times and get an idea of the probability.
# I will have to think this through properly and will continue to work on this problem.