Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.
Let’s compare the usage of ‘business plan’ v/s ‘action plan’ in the TIME Magazine Corpus.
business = c(83, 24245, 49935, (100000000 - 83 - 24245 - 49935))
action = c(114, 24214, 20143, (100000000 - 114 - 24214 - 20143))
business_action = as.data.frame(rbind(business, action))
colnames(business_action) = c("a", "b", "c", "d")
business_actionCalculate the attraction for your bigrams.
## [1] 0.1659403 0.5627684
We see that ‘action’ is much more attracted to ‘plan’ than ‘business’ is.
Calculate the reliance for your bigrams.
## [1] 0.3411707 0.4685959
We see that ‘plan’ is more reliant on ‘action’ than on ‘business’.
Calculate the LL values for your bigrams.
aExp = (business_action$a + business_action$b) * (business_action$a + business_action$c) /
(business_action$a + business_action$b + business_action$c + business_action$d)
LL = LL.collostr(business_action$a, business_action$b, business_action$c, business_action$d)
LL1 = ifelse(business_action$a < aExp, -LL, LL)
LL1## [1] 177.3637 499.1375
Based on the log likelihood values, we can see that there is a mutual attraction between ‘business’ and ‘plan’ as well as between ‘action’ and ‘plan’ because their log likelihood values are positive. However, ‘action plan’ seems to about 3 times more popular than ‘business plan’.
Calculate the PMI for your bigrams.
## [1] 3.68640 9.86739
Based on the PMI values, the word ‘plan’ is about 3 times more likely given the first word is ‘action’ when compared to the probability of ‘plan’ given the first word is ‘business’. Therefore, ‘action plan’ is more popular than ‘business plan’.
Calculate the OR for your bigrams.
## [1] 1.924335 3.151136
The log odds ratio also indicates that ‘action plan’ is more popular than ‘business plan’.
Given the statistics you have calculated above, what is the relation of your bigrams? Write a short summary of the results, making sure to answer the following:
Are they related?
Do they attract or repel each other?
Are there differences between the separate bigrams?