For each roll there are 6 possible outcomes. For 2 times, there are 6 x 6 = 36 possible outcomes. For 3 times, there are 6 X 6 X 6 = 216 possible outcomes. Finally for n times there are \(6^n\) possible outcomes.
The total possible outcomes of rolling die 2 times is 36. There are 2 possible outcomes that we can make a sum of 3: (1,2) and (2,1)
P(sum_of_three) = \(\frac{2}{36}\) = \(\frac{1}{18}\)
Instead of comparing each person with every 24 people their birthdays, we will first calculate probability that none of the 25 people share their birthdays. Then we negate this probability with 1 and get probability of at least 2 having same birthdays.
n <- 25
b <- 365
p_none <- 1- prod(c(365:(365-(n-1)))/365)
p_none
## [1] 0.5686997
The probability a room full of 25 strangers where 2 of them having same birthday is 56.87%.
Now let us compute probability with 50 strangers.
n <- 50
b <- 365
p_none <- 1- prod(c(365:(365-(n-1)))/365)
p_none
## [1] 0.9703736
The probability a room full of 50 strangers where 2 of them having same birthday is 97.04%.
The probability improved drastically by 40%!
I have used tm package to create a corpus of words in the document. I tried removing punctuation but the document is not entirely in text format and some of the quotes could not be removed. I tried various encoding options but still could not remove all the punctuations.
library("tm")
## Warning: package 'tm' was built under R version 3.2.2
## Loading required package: NLP
## Warning: package 'NLP' was built under R version 3.2.2
# read the document
doc <- paste(readLines("assign6.sample.txt"), collapse=" ")
## Warning in readLines("assign6.sample.txt"): incomplete final line found on
## 'assign6.sample.txt'
# create corpus
corpus <-Corpus(VectorSource(doc))
# convert all letters to lowercase, remove punctuations and numbers since we are only dealing with words.
corpus.p <-tm_map(corpus, content_transformer(tolower))
corpus.p <-tm_map(corpus.p, content_transformer(removeNumbers))
corpus.p <-tm_map(corpus.p, content_transformer(removePunctuation))
# find frequency of each word
dtm <-DocumentTermMatrix(corpus.p)
docTermMatrix <- inspect(dtm)
## <<DocumentTermMatrix (documents: 1, terms: 564)>>
## Non-/sparse entries: 564/0
## Sparsity : 0%
## Maximal term length: 18
## Weighting : term frequency (tf)
##
## Terms
## Docs â<U+0080><U+0098>these â<U+0080> â<U+0080><U+009C>a â<U+0080><U+009C>for â<U+0080><U+009C>iâ<U+0080><U+0099>ve â<U+0080><U+009C>it â<U+0080><U+009C>itâ<U+0080><U+0099>s â<U+0080><U+009C>no â<U+0080><U+009C>right
## 1 1 1 1 1 1 1 2 1 1
## Terms
## Docs â<U+0080><U+009C>that â<U+0080><U+009C>the â<U+0080><U+009C>they â<U+0080><U+009C>threestrikesâ<U+0080> â<U+0080><U+009C>we â<U+0080><U+009C>weâ<U+0080><U+0099>re â<U+0080><U+009C>yes
## 1 1 2 1 1 2 1 1
## Terms
## Docs about abundance abundant abuse abysmal according across act acting
## 1 6 1 1 3 1 1 1 1 1
## Terms
## Docs administrationâ<U+0080><U+0099>s after aging agreed alabama alabamaâ<U+0080> alabamaâ<U+0080><U+0099>s
## 1 1 5 1 1 4 1 1
## Terms
## Docs alabaster almost also although among analyst and angel anything
## 1 1 2 1 1 3 1 38 1 2
## Terms
## Docs appalled appetite approval april arbuthnot are argues arise armed
## 1 1 1 1 1 1 9 1 1 1
## Terms
## Docs asked assaults assistant attention attorney autopsy average aware
## 1 2 1 1 1 1 1 1 1
## Terms
## Docs back backward bad banned basic basics bathtub beaten because been
## 1 1 1 2 1 1 2 1 1 1 9
## Terms
## Docs before began beginning believe bentley better beyond bigger birth
## 1 2 1 1 1 2 3 1 1 1
## Terms
## Docs blind bodies botched both box budget build building built buried but
## 1 1 1 1 1 1 2 1 1 1 1 9
## Terms
## Docs called calls cam came cameras can candidate capacity capita care case
## 1 1 1 1 2 1 2 1 2 1 1 1
## Terms
## Docs caution chairman challenging change changeâ<U+0080> changing charlotte
## 1 1 1 1 2 1 2 1
## Terms
## Docs child choices citizensâ<U+0080><U+0099> civil clean clinical colby cologne coming
## 1 2 1 1 1 1 1 2 1 1
## Terms
## Docs commissioner committee commodity conditions congress constitutional
## 1 2 1 1 6 1 1
## Terms
## Docs contact contraband convicted conviction corners corrections court
## 1 1 1 1 2 1 10 1
## Terms
## Docs courts created crime crimes criminals crisis culture curb currency
## 1 1 1 1 3 1 1 1 1 1
## Terms
## Docs custodial damning dangerously daughter dealing death december
## 1 1 1 1 1 1 1 1
## Terms
## Docs defendants deliberate department departmentâ<U+0080><U+0099>s deprivation designed
## 1 1 1 7 1 1 1
## Terms
## Docs disparages document doing donâ<U+0080><U+0099>t double down drowned drug drugs
## 1 1 1 1 2 1 1 1 2 1
## Terms
## Docs dynamiteâ<U+0080> elderly employees enough environment equal even examiner
## 1 1 1 2 1 1 2 2 1
## Terms
## Docs exchanged eyes faced faces failed family far favorsâ<U+0080> fearful
## 1 1 1 1 1 1 1 1 1 1
## Terms
## Docs federal female few filled finally findings fix food for former
## 1 5 2 1 1 1 1 2 1 30 1
## Terms
## Docs forward fresh from gave general george get getting give going good
## 1 1 1 3 1 1 1 4 1 1 1 1
## Terms
## Docs gov government governor governorâ<U+0080> grave great group guard guards
## 1 1 4 1 1 1 1 1 2 2
## Terms
## Docs guidelines guntoting had half happened harassed has have health
## 1 1 1 6 1 1 1 7 9 2
## Terms
## Docs helped her here hereâ<U+0080> highest highly him hire hired his home how
## 1 1 3 2 1 1 1 2 1 1 1 1 2
## Terms
## Docs iâ<U+0080><U+0099>ve ignoring important improve improved included includes
## 1 1 1 1 2 1 1 1
## Terms
## Docs including indifference indigent inhumane initiative inmate inmates
## 1 1 1 1 1 2 1 4
## Terms
## Docs inside instead institute institutions intervention interview into
## 1 2 1 1 1 1 2 3
## Terms
## Docs investigate investigating investigation investigations issued itâ<U+0080>
## 1 1 1 3 1 2 1
## Terms
## Docs itâ<U+0080><U+0099>s items its jail january jocelyn judiciary julia june just
## 1 4 2 4 1 2 1 1 2 1 6
## Terms
## Docs justice kim lack larger larry last law lawyer least legal legislator
## 1 6 1 1 1 1 2 1 1 2 1 1
## Terms
## Docs legislature less levels liberal life like likely live living locked
## 1 3 1 1 1 2 5 1 2 1 1
## Terms
## Docs long longtime look low lowlevel make makeup male management many
## 1 1 1 1 1 1 1 1 1 1 2
## Terms
## Docs marginally marked marsha matter may medical mental met middle million
## 1 1 1 1 1 1 2 2 1 1 4
## Terms
## Docs minimal misconduct money moneyâ<U+0080> monica montgomery month months more
## 1 1 1 2 1 1 1 1 3 6
## Terms
## Docs morrison most mother moved much murder named nation national near
## 1 1 2 1 1 2 1 1 2 1 2
## Terms
## Docs need needs never new nonviolent not now number odds offenders
## 1 3 2 1 1 1 3 3 1 1 2
## Terms
## Docs offenses officer officers officersâ<U+0080> officials often once one only
## 1 1 1 5 1 1 1 1 2 5
## Terms
## Docs open organization organize original other others out over overhaul
## 1 1 2 1 1 2 2 1 2 1
## Terms
## Docs overturned own page paper parole part past people per percent
## 1 1 1 1 1 1 1 1 1 1 1
## Terms
## Docs periodâ<U+0080> personally perspective places plan policies policy
## 1 1 1 1 1 2 2 3
## Terms
## Docs political practices premature pressing primary primitive prison
## 1 1 1 1 1 1 1 11
## Terms
## Docs prisonâ<U+0080><U+0099>s prisoners prisons problem problems procedures programs
## 1 2 7 6 1 2 1 1
## Terms
## Docs project prominence promising prompt property psychologist question
## 1 1 1 1 1 1 1 1
## Terms
## Docs quit raise rampant raped rate recent recently recruiting rectify
## 1 1 1 2 2 1 1 2 1 1
## Terms
## Docs reform relatives released releasing remained remains repeat replaced
## 1 2 1 2 1 1 3 1 1
## Terms
## Docs report reports represents republican request rescinding resellable
## 1 4 2 1 2 1 1 1
## Terms
## Docs review rights robbery robert rodney routinely row rules running said
## 1 1 1 1 1 1 1 1 1 2 22
## Terms
## Docs same samuels say says scrutinizing secondhighest secure see seen sell
## 1 1 1 2 2 1 1 1 1 1 1
## Terms
## Docs senate senator sending senior sent sentence sentencing series serious
## 1 1 1 1 1 1 1 2 2 1
## Terms
## Docs served services serving session several sex sexual sexualized she
## 1 2 1 1 1 1 4 3 1 9
## Terms
## Docs show showed showering sick since situation six soft solution some
## 1 1 1 1 1 3 2 4 1 1 2
## Terms
## Docs sometimes son spending split spots stacy staffing state stateâ<U+0080><U+0099>s
## 1 2 1 1 1 1 1 2 4 2
## Terms
## Docs step stephen stepped stetson still stillborn stockades strip strong
## 1 1 1 1 1 6 1 1 1 1
## Terms
## Docs stuff stupidâ<U+0080> support system systemâ<U+0080> take tampons telephone texas
## 1 1 1 2 2 1 1 1 1 1
## Terms
## Docs than that thatâ<U+0080><U+0099>s the their them then there they thing things think
## 1 7 18 2 74 1 1 1 6 5 1 1 1
## Terms
## Docs third this thisâ<U+0080> thomas those three tied toilet top toxic track
## 1 1 3 1 3 1 1 1 1 2 1 1
## Terms
## Docs tracked transparentâ<U+0080> treatment troubled trying tutwiler tutwilerâ<U+0080>
## 1 1 1 2 1 1 14 1
## Terms
## Docs two unconstitutional uncovered unfolding uniforms use using very
## 1 1 1 1 1 1 2 1 2
## Terms
## Docs violations wanted wants ward warden was washington watched way
## 1 1 1 1 3 1 9 2 1 1
## Terms
## Docs weâ<U+0080><U+0099>re week weighing well were what where whether which while who
## 1 1 1 1 2 4 2 2 2 1 1 9
## Terms
## Docs whose wideranging will with without woman women wood work worked
## 1 1 1 1 8 1 1 6 1 1 1
## Terms
## Docs working worse would year yearâ<U+0080><U+0099>s years you
## 1 1 1 1 3 1 6 1
# total words in doc
totalWords <- length(docTermMatrix)
totalWords
## [1] 564
# probability function
p_of_XandY <- function(x,y){
if (is.na(docTermMatrix[1,x]) || is.na(docTermMatrix[1,y])){
return(0)
}
p_x <- docTermMatrix[1,x]/totalWords
p_y <- docTermMatrix[1,y]/totalWords
return(p_x*p_y)
}
# test cases
p_of_XandY("the","you")
## [1] 0.0002326342
p_of_XandY("and","are")
## [1] 0.001075147