title: “WPA#3”

author: “Rebekka Herz”

date: “28. April 2015”

output: html_document

Create two vectors called first.five and last.five that gives you the first and last 5 letters of the alphabet respectively. Hint: Use the function length() to create last.five.

first.five <- letters[1:5]
first.five

## [1] "a" "b" "c" "d" "e"

length(letters)

## [1] 26

last.five <- letters[21:26]
last.five

## [1] "u" "v" "w" "x" "y" "z"

Create a vector called every.second that gives you every second letter of the alphabet (starting with “b”)

letters[seq(2, 26, 2)]

##  [1] "b" "d" "f" "h" "j" "l" "n" "p" "r" "t" "v" "x" "z"

In what position does the letter “t” occur in the alphabet? (a is in position 1, b is in position 2…)

letters

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"

which(letters == "t")

## [1] 20

In which positions do the five vowels (a, e, i, o, u) ocur in the alphabet? Hint: Use both %in% and which()

which(letters == "a")

## [1] 1

which(letters == "e")

## [1] 5

which(letters == "i")

## [1] 9

which(letters == "o")

## [1] 15

which(letters == "u")

## [1] 21

Last night 10 pirates had a contest to see whose parrot could eat the most seeds in a minute. I stored the values below in a vector called seeds and the names of the pirates in a vector called pirates

Using logical vectors, determine what percent of the birds ate more than 30 seeds.

seeds <- c(6, 10, 243, 12, 43, 20, 34, 18, 24, 20)
pirates <- c("Emmanuel", "Alissa", "Lucia", "Marcel", "Florian", "Nadiia", "Yvonne", "Florina", "Xu", "Zoe")

which(seeds > 30)

## [1] 3 5 7

hungry <- seeds > 30
hungry

##  [1] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE

mean(hungry)

## [1] 0.3

What is the standard deviation of the data for those birds that ate less than 40 seeds?

hungrier <- seeds < 40 
hungrier

##  [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

sd(hungrier)

## [1] 0.421637

at is the name of the pirate whose bird ate the most seeds? What is the name of the pirate whose bird ate the fewest seeds?

min(seeds)

## [1] 6

sort(seeds, decreasing = T)

##  [1] 243  43  34  24  20  20  18  12  10   6

which(seeds == 243)

## [1] 3

pirates[3]

## [1] "Lucia"

pirates[which(seeds == 243)]

## [1] "Lucia"

# Lucia is the name of the Pirate whose parrot ate the most seeds

pirates[which(seeds == 6)]

## [1] "Emmanuel"

# Emmanuel is the name of the Pirate whose parrot ate the fewest seeds

What are the names of the owners of birds that ate more than 50 seeds?

evenhungrier <- seeds > 50
evenhungrier

##  [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

which(evenhungrier)

## [1] 3

pirates[which(evenhungrier)]

## [1] "Lucia"

# Lucia was the only pirate whose parrot ate more than 50 seeds.

What is the mean and median of the values of seeds that are not outliers?

median(seeds)

## [1] 20

#there are no outliers, so I'm guessing we can just use the regular median function?

Last weekend I asked five pirates what their top three favorite tv shows are. Here are the results:

katharina <- c("Breaking Bad", "Game of Thrones", "The Simpsons")
tani <- c("Orange is the New Black", "Game of Thrones", "House of Cards")
alexander <- c("The Walking Dead", "Game of Thrones", "True Detective")
sarah <- c("Mad Men", "My Little Pony", "House of Cards")
rebekka <- c("Game of Thrones", "Gotham", "Breaking Bad")

Create a vector called responses that contains all of the responses from the survey in one vector.

responses <- c(katharina, tani, alexander, sarah, rebekka)
responses

##  [1] "Breaking Bad"            "Game of Thrones"        
##  [3] "The Simpsons"            "Orange is the New Black"
##  [5] "Game of Thrones"         "House of Cards"         
##  [7] "The Walking Dead"        "Game of Thrones"        
##  [9] "True Detective"          "Mad Men"                
## [11] "My Little Pony"          "House of Cards"         
## [13] "Game of Thrones"         "Gotham"                 
## [15] "Breaking Bad"

Create a vector called favs that gives all unique shows that people chose.

favs <- unique(responses)
favs

##  [1] "Breaking Bad"            "Game of Thrones"        
##  [3] "The Simpsons"            "Orange is the New Black"
##  [5] "House of Cards"          "The Walking Dead"       
##  [7] "True Detective"          "Mad Men"                
##  [9] "My Little Pony"          "Gotham"

Create a table showing how often each of the shows was listed as a favorite.

table(responses)

## responses
##            Breaking Bad         Game of Thrones                  Gotham 
##                       2                       4                       1 
##          House of Cards                 Mad Men          My Little Pony 
##                       2                       1                       1 
## Orange is the New Black            The Simpsons        The Walking Dead 
##                       1                       1                       1 
##          True Detective 
##                       1

Did anyone choose the show “Orange is the New Black”? Write code that answers this question by returning a single logical value of TRUE/FALSE or 0/1. Test this in two ways: once using %in%, and once using only the function sum() and the logical == operator.

# 1.) using the %in% function

"Orange is the New Black" %in% favs

## [1] TRUE

# Yes, somebody must have chosen the above mentioned movie. 


# 2.) using sum() and the logical ==

sum(responses == "Orange is the New Black")

## [1] 1

# This means that one person chose "Orange is the New Black" as their favourite movie.

How many times did people say “House of Cards” was one of their favorite? Try using logical indexing and sum()

sum(responses == "House of Cards")

## [1] 2

What percent of the responses was “House of Cards”? Does this percentage reflect the percentage of people who like “House of Cards” or not?

mean(c(responses == "House of Cards"))

## [1] 0.1333333

#13,3 % 

# 2 out of 5 people like the movie "House of Cards" so it can't be 13,3% - but how to prove it?

Which tv shows do both rebekka and tani like?

intersect(rebekka, tani)

## [1] "Game of Thrones"

#Both rebekka and tani like "Game of Thrones"

Which tv show(s) do you think alexander might recommend to katharina? In other words, which show(s) does alexander like that katharina didn’t mention?

setdiff(alexander, katharina)

## [1] "The Walking Dead" "True Detective"

#[1] "The Walking Dead" "True Detective"  

"The Walking Dead" %in% alexander

## [1] TRUE

"The Walking Dead" %in% katharina

## [1] FALSE

"True Detective" %in% katharina

## [1] FALSE

# Since The Walking Dead  and True Detective are not among the movies that Katharina has mentioned, Alexander might recommend it to Katharina.

The “Birthday Problem” is a famous statistics riddle. The question is: How many people do you need to have in a room, for the probability to be greater than 0.50 that at least two people in the room have the same birthday? For example, if there are just 2 people, the probability is quite low, while if there are 366 people, the probability is 1.0. How many people do you need for the probability to be just a bit larger than 0.50? Later on in the course we’ll solve this question, but for now, let’s test a small example of 30 people. Find a way to simulate a single room (Hint: use sample()) with 30 people. Now find a way to test if any two people in the room have the same birthday (Hint: Use a combination of unique() and length(), or the function duplicated()). Try running your code a few times and get an idea of the probability.

# I will have to think this through properly and will continue to work on this problem.