Load the Libraries + Functions

Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.

Create the Data

  • One of the corpora here: https://corpus.byu.edu/corpora.asp
  • Collect two bigrams that you can compare using the association measures listed (does not have to be X-Y, Z-Y, but that would help you compare them)
  • Create a dataframe like the one from lecture of those bigrams

Let’s compare the usage of ‘business plan’ v/s ‘action plan’ in the TIME Magazine Corpus.

Attraction and Reliance

Calculate the attraction for your bigrams.

## [1] 0.1659403 0.5627684

We see that ‘action’ is much more attracted to ‘plan’ than ‘business’ is.

Calculate the reliance for your bigrams.

## [1] 0.3411707 0.4685959

We see that ‘plan’ is more reliant on ‘action’ than on ‘business’.

Log Likelihood

Calculate the LL values for your bigrams.

## [1] 177.3637 499.1375

Based on the log likelihood values, we can see that there is a mutual attraction between ‘business’ and ‘plan’ as well as between ‘action’ and ‘plan’ because their log likelihood values are positive. However, ‘action plan’ seems to about 3 times more popular than ‘business plan’.

Pointwise Mutual Information

Calculate the PMI for your bigrams.

## [1] 3.68640 9.86739

Based on the PMI values, the word ‘plan’ is about 3 times more likely given the first word is ‘action’ when compared to the probability of ‘plan’ given the first word is ‘business’. Therefore, ‘action plan’ is more popular than ‘business plan’.

Odds Ratio

Calculate the OR for your bigrams.

## [1] 1.924335 3.151136

The log odds ratio also indicates that ‘action plan’ is more popular than ‘business plan’.

Interpret your results

Given the statistics you have calculated above, what is the relation of your bigrams? Write a short summary of the results, making sure to answer the following:

  • Are they related?

    • Based on the attraction and reliance values, both ‘action’ and ‘plan’ and ‘business’ and ‘plan’ seem to be related.
  • Do they attract or repel each other?

    • Both sets of words attract each other as can be confirmed with the positive log likelihood values.
  • Are there differences between the separate bigrams?

    • Yes, based on the log likelihood values, the odds ratio, and PMI values, it seems that ‘action plan’ is more popular than ‘business plan’.
    • ‘Action plan’ seems to be about 3 times more likely than ‘business plan’.