This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (randomly selected by me, n=132).

text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"), 
                    encoding = "UTF-8",
                    docvarsfrom = "filenames", 
                    docvarnames = c("num", "type", "company", "date"),
                    dvsep = "_")

This step is when we actually start reducing our corpus, excluding stopwords (exist in Quanteda Package) and words that we want in order to reduce the corpus to words that are meaningful to us. I am also removing punctuation, white space, numbers, urls, and other symbols. This step will also lower all letters.

Descriptive statistical analysis of words-of-interest

This step also stems words (shortening words to just their root forms) using Martin Porter’s stemming algorithm (included in Quanteda Package).

Frequencies grouped by year

##           feature frequency rank docfreq group
## 1          system       581    1       4  2001
## 2           insur       554    2       4  2001
## 3         pipelin       521    3       2  2001
## 4         product       520    4       4  2001
## 5            oper       492    5       4  2001
## 6           manag       431    6       4  2001
## 7            fund       421    7       4  2001
## 8            unit       398    8       4  2001
## 9           magna       388    9       1  2001
## 10        financi       381   10       4  2001
## 11        compani       353   11       4  2001
## 12        general       336   12       4  2001
## 13           busi       321   13       4  2001
## 14           life       308   14       3  2001
## 15         includ       306   15       4  2001
## 16         market       301   16       4  2001
## 17          share       294   17       4  2001
## 18            ppc       280   18       1  2001
## 19        manulif       274   19       1  2001
## 20         invest       272   20       4  2001
## 21        partner       270   21       4  2001
## 22         provid       269   22       4  2001
## 23           will       260   23       4  2001
## 24    partnership       247   24       4  2001
## 25          asset       223   25       4  2001
## 26           year       221   26       4  2001
## 27        develop       219   27       4  2001
## 28            new       205   28       4  2001
## 29     manufactur       202   29       2  2001
## 30            oil       202   29       4  2001
## 31          class       198   31       4  2001
## 32          trust       197   32       3  2001
## 33         requir       195   33       4  2001
## 34          group       194   34       4  2001
## 35      distribut       192   35       4  2001
## 36      agreement       186   36       4  2001
## 37         custom       180   37       4  2001
## 38        automot       178   38       2  2001
## 39          limit       174   39       4  2001
## 40       director       172   40       4  2001
## 41            one       171   41       4  2001
## 42         servic       168   42       4  2001
## 43           also       167   43       4  2001
## 44          capit       166   44       4  2001
## 45          addit       166   44       4  2001
## 46       interest       150   46       4  2001
## 47          incom       150   46       4  2001
## 48       canadian       149   48       4  2001
## 49          crude       146   49       3  2001
## 50       approxim       142   50       4  2001
## 51          state       141   51       4  2001
## 52           vote       141   51       4  2001
## 53          facil       140   53       3  2001
## 54           sale       138   54       4  2001
## 55           cash       137   55       4  2001
## 56         vehicl       136   56       3  2001
## 57            oem       135   57       1  2001
## 58          secur       134   58       4  2001
## 59          offic       132   59       4  2001
## 60           cost       127   60       4  2001
## 61           term       124   61       4  2001
## 62           time       124   61       4  2001
## 63          north       124   61       4  2001
## 64         inform       123   64       4  2001
## 65       individu       123   64       3  2001
## 66          engin       121   66       2  2001
## 67     subsidiari       119   67       4  2001
## 68        increas       118   68       4  2001
## 69            end       116   69       4  2001
## 70        respect       116   69       4  2001
## 71        certain       114   71       4  2001
## 72          prior       113   72       4  2001
## 73          regul       112   73       4  2001
## 74        employe       111   74       4  2001
## 75          relat       111   74       4  2001
## 76         design       111   74       4  2001
## 77            two       110   77       4  2001
## 78         produc       108   78       4  2001
## 79        complet       107   79       4  2001
## 80            own       106   80       4  2001
## 81             u.       106   80       4  2001
## 82       supplier       105   82       1  2001
## 83           plan       104   83       4  2001
## 84        program       103   84       4  2001
## 85        ontario       102   85       3  2001
## 86        reinsur       102   85       1  2001
## 87         polici       101   87       4  2001
## 88           base       101   87       4  2001
## 89           rate        99   89       4  2001
## 90          divis        99   89       3  2001
## 91           mean        98   91       4  2001
## 92           meet        98   91       4  2001
## 93          offer        98   91       4  2001
## 94         presid        98   91       4  2001
## 95       unithold        98   91       2  2001
## 96         person        96   96       4  2001
## 97        pension        96   96       3  2001
## 98            law        95   98       4  2001
## 99       industri        94   99       4  2001
## 100        execut        94   99       4  2001
## 101        amount        94   99       4  2001
## 102        corpor        93  102       4  2001
## 103         price        93  102       4  2001
## 104           non        92  104       4  2001
## 105        direct        92  104       4  2001
## 106      consolid        92  104       4  2001
## 107     sharehold        92  104       4  2001
## 108          bbls        92  104       2  2001
## 109        result        91  109       4  2001
## 110        follow        90  110       4  2001
## 111       account        90  110       4  2001
## 112        intern        90  110       4  2001
## 113       assembl        90  110       1  2001
## 114        number        89  114       4  2001
## 115     administr        89  114       4  2001
## 116       continu        88  116       4  2001
## 117       princip        88  116       4  2001
## 118       annuiti        88  116       1  2001
## 119       control        87  119       4  2001
## 120          valu        87  119       4  2001
## 121          part        86  121       4  2001
## 122       subject        86  121       4  2001
## 123        holder        86  121       4  2001
## 124        approv        85  124       4  2001
## 125        mutual        85  124       2  2001
## 126          risk        84  126       4  2001
## 127         trade        84  126       4  2001
## 128     establish        84  126       4  2001
## 129      dividend        83  129       3  2001
## 130          issu        83  129       4  2001
## 131        averag        83  129       4  2001
## 132           net        83  129       4  2001
## 133        author        83  129       4  2001
## 134       america        82  134       3  2001
## 135     statement        81  135       4  2001
## 136          seat        81  135       1  2001
## 137        decoma        81  135       1  2001
## 138          well        80  138       4  2001
## 139      signific        80  138       4  2001
## 140         major        79  140       4  2001
## 141          tier        79  140       2  2001
## 142        period        78  142       4  2001
## 143           see        78  142       4  2001
## 144          note        77  144       4  2001
## 145        repres        77  144       4  2001
## 146         board        77  144       4  2001
## 147        compon        77  144       3  2001
## 148          koch        77  144       1  2001
## 149       mortgag        77  144       2  2001
## 150          hold        75  150       4  2001
## 151    throughput        74  151       2  2001
## 152        liabil        73  152       4  2001
## 153     transport        72  153       3  2001
## 154           tax        72  153       4  2001
## 155        applic        72  153       4  2001
## 156         right        71  156       4  2001
## 157          area        71  156       4  2001
## 158     portfolio        70  158       2  2001
## 159      competit        69  159       4  2001
## 160    regulatori        69  159       4  2001
## 161           use        69  159       4  2001
## 162          vice        69  159       4  2001
## 163        acquir        68  163       4  2001
## 164       current        67  164       4  2001
## 165           per        67  164       4  2001
## 166      american        67  164       4  2001
## 167          item        66  167       3  2001
## 168      acquisit        66  167       4  2001
## 169     technolog        65  169       4  2001
## 170        equiti        65  169       4  2001
## 171      calendar        64  171       3  2001
## 172        effect        64  171       4  2001
## 173        govern        64  171       4  2001
## 174       benefit        64  171       4  2001
## 175        report        63  175       4  2001
## 176         parti        63  175       4  2001
## 177        fiscal        62  177       3  2001
## 178        associ        62  177       4  2001
## 179         locat        62  177       4  2001
## 180        materi        61  180       4  2001
## 181         activ        61  180       4  2001
## 182        believ        61  180       4  2001
## 183        suppli        61  180       4  2001
## 184      properti        60  184       4  2001
## 185         volum        60  184       4  2001
## 186         futur        60  184       4  2001
## 187         chang        60  184       4  2001
## 188       quarter        60  184       4  2001
## 189         stock        60  184       4  2001
## 190      outstand        60  184       4  2001
## 191          sinc        60  184       4  2001
## 192        profit        60  184       4  2001
## 193       payment        60  184       4  2001
## 194       process        60  184       4  2001
## 195      subordin        60  184       3  2001
## 196         capac        59  196       4  2001
## 197         amend        59  196       4  2001
## 198       exchang        58  198       4  2001
## 199          date        58  198       4  2001
## 200           act        58  198       4  2001
## 201           day        58  198       4  2001
## 202      particip        58  198       4  2001
## 203        member        56  203       4  2001
## 204         avail        56  203       4  2001
## 205         chief        56  203       4  2001
## 206       purchas        56  203       4  2001
## 207        health        56  203       3  2001
## 208        recent        55  208       4  2001
## 209         refer        55  208       4  2001
## 210        revenu        55  208       4  2001
## 211        declar        55  208       3  2001
## 212        expens        55  208       4  2001
## 213       premium        55  208       1  2001
## 214        integr        55  208       4  2001
## 215         tesma        55  208       1  2001
## 216      contract        54  216       3  2001
## 217         third        54  216       4  2001
## 218      independ        54  216       4  2001
## 219         modul        54  216       1  2001
## 220         europ        54  216       2  2001
## 221           fee        53  221       4  2001
## 222         initi        53  221       4  2001
## 223          line        53  221       4  2001
## 224          held        53  221       4  2001
## 225        common        52  225       3  2001
## 226      transfer        52  225       4  2001
## 227        expect        52  225       4  2001
## 228        condit        52  225       4  2001
## 229      pursuant        52  225       4  2001
## 230         three        52  225       4  2001
## 231        reserv        52  225       3  2001
## 232      restrict        52  225       4  2001
## 233         steyr        52  225       1  2001
## 234    policyhold        52  225       1  2001
## 235       perform        51  235       4  2001
## 236        public        51  235       4  2001
## 237         deliv        51  235       4  2001
## 238        capabl        51  235       4  2001
## 239           gas        51  235       4  2001
## 240        accord        50  240       4  2001
## 241         exist        50  240       4  2001
## 242        enhanc        50  240       4  2001
## 243       support        50  240       4  2001
## 244      michigan        50  240       2  2001
## 245      strategi        49  245       3  2001
## 246          loss        49  245       4  2001
## 247          less        49  245       4  2001
## 248        truste        49  245       3  2001
## 249         level        48  249       4  2001
## 250     primarili        48  249       4  2001
## 251          paid        48  249       4  2001
## 252         month        48  249       4  2001
## 253         posit        47  253       4  2001
## 254          sell        47  253       4  2001
## 255          case        47  253       4  2001
## 256          five        46  256       4  2001
## 257   environment        46  256       4  2001
## 258           set        46  256       4  2001
## 259        aggreg        46  256       4  2001
## 260         power        46  256       4  2001
## 261       coverag        46  256       4  2001
## 262       western        46  256       3  2001
## 263           ltd        46  256       4  2001
## 264           mec        46  256       1  2001
## 265          hong        46  256       1  2001
## 266          kong        46  256       1  2001
## 267      transact        45  267       4  2001
## 268          made        45  267       4  2001
## 269        gather        45  267       2  2001
## 270         centr        45  267       3  2001
## 271         river        45  267       2  2001
## 272        segreg        45  267       2  2001
## 273      interior        45  267       1  2001
## 274      structur        44  274       4  2001
## 275      descript        44  274       4  2001
## 276       variabl        44  274       2  2001
## 277       consist        44  274       4  2001
## 278         feder        44  274       4  2001
## 279        affili        44  274       4  2001
## 280        propos        44  274       4  2001
## 281        growth        43  281       4  2001
## 282        purpos        43  281       4  2001
## 283        entitl        43  281       4  2001
## 284       condens        43  281       2  2001
## 285          data        42  285       4  2001
## 286      incorpor        42  285       4  2001
## 287      determin        42  285       4  2001
## 288          test        42  285       3  2001
## 289       centuri        42  285       1  2001
## 290        dollar        41  290       4  2001
## 291     constitut        41  290       4  2001
## 292        except        41  290       4  2001
## 293       various        41  290       4  2001
## 294       connect        41  290       4  2001
## 295          rang        41  290       4  2001
## 296         focus        41  290       3  2001
## 297       british        41  290       4  2001
## 298      european        41  290       1  2001
## 299        prefer        40  299       3  2001
## 300         order        40  299       4  2001
## 301        reason        40  299       4  2001
## 302        select        40  299       4  2001
## 303        return        40  299       4  2001
## 304          bank        40  299       4  2001
## 305        feeder        40  299       2  2001
## 306           iii        39  306       4  2001
## 307      indirect        39  306       4  2001
## 308         light        39  306       3  2001
## 309      maintain        39  306       4  2001
## 310          make        39  306       4  2001
## 311         sourc        39  306       4  2001
## 312       respons        39  306       4  2001
## 313        receiv        39  306       4  2001
## 314      columbia        39  306       3  2001
## 315  saskatchewan        39  306       3  2001
## 316        termin        39  306       4  2001
## 317       redempt        39  306       2  2001
## 318         plant        39  306       4  2001
## 319          ngls        39  306       1  2001
## 320        within        38  320       4  2001
## 321          basi        38  320       4  2001
## 322         reduc        38  320       4  2001
## 323    particular        38  320       4  2001
## 324       septemb        37  324       4  2001
## 325        ventur        37  324       3  2001
## 326     substanti        37  324       4  2001
## 327          real        37  324       4  2001
## 328         carri        37  324       4  2001
## 329       conduct        37  324       4  2001
## 330      exterior        37  324       1  2001
## 331       austria        37  324       1  2001
## 332       compris        36  332       4  2001
## 333          name        36  332       4  2001
## 334        global        36  332       2  2001
## 335     ownership        36  332       4  2001
## 336          bond        36  332       2  2001
## 337         claim        35  337       4  2001
## 338       contain        35  337       4  2001
## 339         oblig        35  337       4  2001
## 340          tool        35  337       3  2001
## 341         estat        35  337       2  2001
## 342        depend        35  337       4  2001
## 343         union        35  337       3  2001
## 344          util        35  337       4  2001
## 345        permit        35  337       4  2001
## 346         least        35  337       4  2001
## 347      chairman        35  337       4  2001
## 348          serv        35  337       4  2001
## 349         innov        35  337       2  2001
## 350       pembina        35  337       1  2001
## 351        factor        34  351       4  2001
## 352         natur        34  351       3  2001
## 353          abil        34  351       4  2001
## 354        combin        34  351       4  2001
## 355        togeth        34  351       4  2001
## 356          lead        34  351       3  2001
## 357        storag        34  351       3  2001
## 358         equal        34  351       3  2001
## 359        provis        34  351       4  2001
## 360         joint        34  351       4  2001
## 361      committe        33  361       4  2001
## 362       qualiti        33  361       4  2001
## 363       toronto        33  361       4  2001
## 364        expand        33  361       4  2001
## 365          bodi        33  361       2  2001
## 366          loan        33  361       4  2001
## 367       shipper        33  361       2  2001
## 368         japan        33  361       2  2001
## 369         event        32  369       4  2001
## 370         among        32  369       4  2001
## 371        regist        32  369       4  2001
## 372       appoint        32  369       3  2001
## 373          june        32  369       4  2001
## 374        option        32  369       4  2001
## 375          sold        32  369       4  2001
## 376        specif        32  369       4  2001
## 377        employ        32  369       4  2001
## 378          take        32  369       4  2001
## 379         sever        32  369       4  2001
## 380       calgari        32  369       3  2001
## 381          toll        32  369       2  2001
## 382       germani        32  369       1  2001
## 383       resourc        31  383       4  2001
## 384        consid        31  383       4  2001
## 385      standard        31  383       4  2001
## 386           mid        31  383       3  2001
## 387       success        31  383       3  2001
## 388         truck        31  383       3  2001
## 389        tariff        31  383       2  2001
## 390           pay        31  383       3  2001
## 391        payabl        31  383       4  2001
## 392       deposit        31  383       1  2001
## 393          door        31  383       1  2001
## 394 daimlerchrysl        31  383       1  2001
## 395        senior        30  395       4  2001
## 396        action        30  395       4  2001
## 397         indic        30  395       4  2001
## 398       protect        30  395       4  2001
## 399          debt        30  395       4  2001
## 400       similar        30  395       4  2001
## 401      investor        30  395       4  2001
## 402          work        30  395       4  2001
## 403        involv        30  395       4  2001
## 404          need        30  395       3  2001
## 405         allow        30  395       4  2001
## 406          long        30  395       4  2001
## 407        adjust        30  395       4  2001
## 408         agenc        30  395       1  2001
## 409         incur        30  395       3  2001
## 410        review        30  395       3  2001
## 411          asia        30  395       2  2001
## 412        broker        30  395       1  2001
## 413          york        30  395       2  2001
## 414      edmonton        30  395       2  2001
## 415          inch        30  395       1  2001
## 416       surplus        30  395       2  2001
## 417        financ        29  417       4  2001
## 418        remain        29  417       4  2001
## 419          larg        29  417       4  2001
## 420          earn        29  417       4  2001
## 421        improv        29  417       3  2001
## 422     construct        29  417       4  2001
## 423      function        29  417       4  2001
## 424          aris        29  417       4  2001
## 425        excess        29  417       4  2001
## 426       delawar        29  417       2  2001
## 427  relationship        28  427       3  2001
## 428        matter        28  427       4  2001
## 429          list        28  427       4  2001
## 430         small        28  427       4  2001
## 431         resid        28  427       4  2001
## 432       largest        28  427       4  2001
## 433        domest        28  427       3  2001
## 434          four        28  427       4  2001
## 435       special        28  427       4  2001
## 436       compens        28  427       4  2001
## 437          copi        28  427       4  2001
## 438        exceed        28  427       4  2001
## 439       channel        28  427       1  2001
## 440          trim        28  427       1  2001
## 441          ford        28  427       1  2001
## 442         agent        27  442       3  2001
## 443        tradit        27  442       3  2001
## 444        advers        27  442       4  2001
## 445       present        27  442       4  2001
## 446          past        27  442       4  2001
## 447        record        27  442       4  2001
## 448         enter        27  442       4  2001
## 449         becom        27  442       4  2001
## 450       segment        27  442       4  2001
## 451          high        27  442       4  2001
## 452         retir        27  442       3  2001
## 453       written        27  442       3  2001
## 454      document        27  442       4  2001
## 455         panel        27  442       1  2001
## 456       abandon        27  442       2  2001
## 457         audit        26  457       4  2001
## 458       discuss        26  457       4  2001
## 459       generat        26  457       4  2001
## 460           due        26  457       4  2001
## 461          type        26  457       3  2001
## 462      opportun        26  457       4  2001
## 463        either        26  457       4  2001
## 464          full        26  457       4  2001
## 465        balanc        26  457       4  2001
## 466       minimum        26  457       3  2001
## 467      deliveri        26  457       3  2001
## 468          save        26  457       3  2001
## 469          file        26  457       4  2001
## 470         separ        26  457       4  2001
## 471       univers        26  457       2  2001
## 472            ia        26  457       1  2001
## 473          peac        26  457       1  2001
## 474       qualifi        25  474       4  2001
## 475         defin        25  474       4  2001
## 476       request        25  474       4  2001
## 477      institut        25  474       2  2001
## 478        annual        25  474       4  2001
## 479      commerci        25  474       1  2001
## 480          forc        25  474       4  2001
## 481        safeti        25  474       3  2001
## 482         elect        25  474       4  2001
## 483        dealer        25  474       4  2001
## 484          duti        25  474       4  2001
## 485           bbl        25  474       2  2001
## 486     complianc        24  486       4  2001
## 487        assess        24  486       3  2001
## 488    expenditur        24  486       3  2001
## 489          form        24  486       4  2001
## 490    competitor        24  486       3  2001
## 491      platform        24  486       2  2001
## 492      guarante        24  486       2  2001
## 493       collect        24  486       4  2001
## 494       portion        24  486       4  2001
## 495        region        24  486       3  2001
## 496         forth        24  486       4  2001
## 497           bow        24  486       1  2001
## 498        mexico        24  486       1  2001
## 499         cosma        24  486       1  2001
## 500           mfc        24  486       1  2001
##        feature frequency rank docfreq group
## 126       risk        84  126       4  2001
## 3479      risk        98  115       5  2002
## 6969      risk        86  105       4  2003
## 10272     risk       129   95       5  2004
## 13898     risk       186   75       5  2005
## 17810     risk       266   43       5  2006
## 21747     risk       302   35       5  2007
## 25387     risk       379   32       7  2008
## 29894     risk       541   18       8  2009
## 34389     risk       514   21       7  2010
## 38939     risk       508   19       7  2011
## 43399     risk       484   16       6  2012
## 47576     risk       467   15       6  2013
## 51750     risk       605   16       8  2014
## 56622     risk       664   36       8  2015
## 62031     risk       536   30       8  2016
## 66974     risk       524   24       9  2017
## 72139     risk       636   19       9  2018
## 77507     risk       666   19       9  2019
## 82982     risk       649   15       7  2020
## 12336   climat         2 2034       1  2004
## 15645   climat         4 1739       2  2005
## 19797   climat         3 1949       2  2006
## 23958   climat         2 2152       2  2007
## 28685   climat         1 3200       1  2008
## 36508   climat         4 2077       2  2010
## 41005   climat         4 2031       2  2011
## 45113   climat         5 1672       3  2012
## 49260   climat         5 1643       3  2013
## 53615   climat         6 1829       3  2014
## 58441   climat         9 1809       3  2015
## 63542   climat        10 1508       3  2016
## 68282   climat        13 1299       4  2017
## 73063   climat        29  924       6  2018
## 78288   climat        37  792       6  2019
## 83517   climat        65  547       6  2020
## 12446   carbon         2 2034       1  2004
## 15497   carbon         5 1565       2  2005
## 19451   carbon         5 1584       2  2006
## 22995   carbon         8 1241       2  2007
## 27436   carbon         4 1986       2  2008
## 32689   carbon         2 2646       1  2009
## 36066   carbon         7 1626       3  2010
## 40867   carbon         5 1838       3  2011
## 44997   carbon         6 1525       3  2012
## 48930   carbon         8 1311       3  2013
## 53125   carbon        11 1345       4  2014
## 58818   carbon         6 2161       3  2015
## 63435   carbon        12 1385       3  2016
## 68143   carbon        16 1165       4  2017
## 73218   carbon        22 1078       5  2018
## 78636   carbon        21 1133       5  2019
## 83880   carbon        33  896       4  2020
## 2484    scienc         1 2349       1  2001
## 16164   scienc         2 2264       1  2005
## 19569   scienc         4 1751       2  2006
## 22915   scienc         9 1179       3  2007
## 26562   scienc        12 1187       5  2008
## 31527   scienc         7 1622       4  2009
## 35656   scienc        12 1267       5  2010
## 40133   scienc        13 1195       6  2011
## 44744   scienc         8 1328       4  2012
## 48770   scienc        10 1187       4  2013
## 52849   scienc        17 1101       7  2014
## 57712   scienc        24 1117       6  2015
## 63084   scienc        21 1071       6  2016
## 68049   scienc        19 1083       6  2017
## 73381   scienc        16 1253       7  2018
## 78657   scienc        20 1152       7  2019
## 84319   scienc        16 1332       5  2020
## 1635      pari         3 1576       1  2001
## 5104      pari         3 1668       1  2002
## 8038      pari         6 1129       1  2003
## 11962     pari         3 1735       2  2004
## 16693     pari         1 2775       1  2005
## 20649     pari         1 2773       1  2006
## 27382     pari         4 1986       1  2008
## 31790     pari         5 1878       2  2009
## 36486     pari         4 2077       2  2010
## 41602     pari         2 2620       1  2011
## 45855     pari         2 2424       1  2012
## 50036     pari         2 2411       1  2013
## 54217     pari         3 2411       2  2014
## 59173     pari         4 2544       3  2015
## 64601     pari         3 2544       3  2016
## 69521     pari         3 2513       3  2017
## 74840     pari         3 2666       3  2018
## 80260     pari         3 2721       3  2019
## 85496     pari         4 2491       3  2020
## 615    environ        18  611       3  2001
## 3998   environ        20  633       4  2002
## 7490   environ        17  624       3  2003
## 10823  environ        21  643       4  2004
## 14464  environ        27  639       5  2005
## 18329  environ        33  560       5  2006
## 22209  environ        36  495       5  2007
## 25815  environ        50  460       7  2008
## 30268  environ        69  389       7  2009
## 34750  environ        71  379       6  2010
## 39272  environ        73  352       6  2011
## 43862  environ        41  477       5  2012
## 48011  environ        44  450       5  2013
## 52165  environ        64  431       7  2014
## 57230  environ        61  644       7  2015
## 62583  environ        53  581       7  2016
## 67507  environ        51  557       7  2017
## 72601  environ        71  479       8  2018
## 78007  environ        66  519       7  2019
## 83571  environ        57  604       5  2020
## 12768    green         1 2537       1  2004
## 24360    green         1 2593       1  2007
## 44824    green         7 1424       2  2012
## 50590    green         1 2984       1  2013
## 55268    green         1 3447       1  2014
## 58644    green         7 2038       4  2015
## 63939    green         6 1913       3  2016
## 68695    green         7 1729       3  2017
## 73716    green        10 1579       5  2018
## 79309    green         8 1798       4  2019
## 85099    green         6 2116       3  2020
## 26686      ghg        10 1300       1  2008
## 46495      ghg         1 2974       1  2012
## 54697      ghg         2 2808       1  2014
## 70006      ghg         2 2900       1  2017
## 74160      ghg         6 1993       1  2018
## 79147      ghg        10 1619       3  2019
## 85137      ghg         6 2116       3  2020
## 532     energi        22  520       3  2001
## 3840    energi        29  467       4  2002
## 7665    energi        12  783       3  2003
## 10717   energi        27  527       4  2004
## 14191   energi        53  365       4  2005
## 17997   energi        83  228       4  2006
## 21873   energi       110  161       3  2007
## 25532   energi       131  174       6  2008
## 30138   energi       100  261       7  2009
## 34543   energi       150  174       6  2010
## 39117   energi       127  196       6  2011
## 43719   energi        60  332       6  2012
## 47889   energi        62  324       6  2013
## 51982   energi       113  246       8  2014
## 56979   energi       111  388       8  2015
## 62324   energi       110  323       7  2016
## 67238   energi       115  287       7  2017
## 72335   energi       166  215       7  2018
## 77681   energi       184  191       8  2019
## 83138   energi       217  170       6  2020
## 606   research        19  583       2  2001
## 3993  research        21  612       3  2002
## 7508  research        17  624       2  2003
## 10847 research        21  643       3  2004
## 14478 research        27  639       5  2005
## 18504 research        24  717       5  2006
## 22828 research        11 1068       4  2007
## 26113 research        27  738       6  2008
## 30743 research        25  847       4  2009
## 35242 research        25  852       4  2010
## 39815 research        23  872       4  2011
## 44216 research        20  810       4  2012
## 48500 research        16  905       5  2013
## 52636 research        25  888       5  2014
## 58111 research        14 1498       5  2015
## 63489 research        11 1453       5  2016
## 68469 research        10 1468       5  2017
## 73499 research        14 1340       5  2018
## 78754 research        17 1245       5  2019
## 84382 research        15 1380       4  2020

This plot demonstrates frequencies of the words-of-interest including “Risk”

This plot shows the same data but excluding “Risk” because it supresses other words by the volume.

To check if the word “Paris” in the given corpus relates to “paris agreement”, we can employ Keywords in Context. KWIC allow us to eyeball the needed word and words that are next to it.

This step transforms the corpus of words into matrices called: DTM for Document-Term-Matrix (Quanteda) DFM for Document-Feature-Matrix (Dplyr) When creating a matrix, we can choose whether we want to continue working with one-word-per-column matrix (unigrams) or phrases with two words (bigrams) ot three words (trigrams).

For the demonstration purpose, I chose unigrams and bigrams. At the same time this step trims the matrix to minimum number of frequencies per term = 2 and per document = 2. This means that each term has to repeat twice within a document and be present at least in 2 documents. This reduces the corpus to make it more managable.

## Document-feature matrix of: 5 documents, 8,393 features (77.8% sparse) and 4 docvars.
##                                        features
## docs                                    inform fiscal year end content
##   8001070_AIF_DollaramaInc_25042014.txt     75     59   99  49       1
##   8001071_AIF_DollaramaInc_24042015.txt     78     63  101  46       1
##   8001072_AIF_DollaramaInc_22042016.txt    127     62  107  49       1
##   8001073_AIF_DollaramaInc_17042017.txt    115     66   89  28       1
##   8001074_AIF_DollaramaInc_20042018.txt    111     70   71  15       1
##                                        features
## docs                                    explanatori note forward look statement
##   8001070_AIF_DollaramaInc_25042014.txt           2   48      13   13        39
##   8001071_AIF_DollaramaInc_24042015.txt           2  101      14   14        41
##   8001072_AIF_DollaramaInc_22042016.txt           2   87      13   13        39
##   8001073_AIF_DollaramaInc_17042017.txt           2   87      14   13        36
##   8001074_AIF_DollaramaInc_20042018.txt           2  110      15   12        35
## [ reached max_nfeat ... 8,383 more features ]

Because the method is probabilistic, we need to set seed for “replicability” of results.

set.seed(250)

TOPIC MODELLING

KeyATM

I created a dictionary for science skepticism (example).

Scien_Dict_KeyATM <- list(
  climate = c("climat", "green", "ghg", "environ", "pari", "polici", "chang", "interior", "natur", "emis"),
  science = c("scienc", "research", "certain", "pursuant", "fact"),
  energy = c("oil", "energi", "pipelin", "gas", "vehicl", "crude", "reserv"))

The output demonstrates a base KeyATM model with extra 3 topics It also shows which topic is the most common in a documents

## Initializing the model...
## Warning in check_keywords(info$wd_names, keywords, options$prune): A keyword
## will be pruned because it does not appear in documents: emis
## Fitting the model. 1500 iterations...
## Creating an output object. It may take time...
##    1_climate 2_science    3_energy     Other_1  Other_2    Other_3
## 1       rate     insur pipelin [✓]       share     bank    product
## 2      share   compani     pembina     restaur    share manufactur
## 3  chang [✓]      life      system         tax committe     system
## 4       risk   financi     oil [✓]   agreement  financi      magna
## 5       busi   product        unit         net    audit vehicl [3]
## 6       seri     manag        oper     exchang   prefer    automot
## 7     corpor    invest partnership       incom     rate       oper
## 8    financi    market     general        oper    power      share
## 9     credit      fund     gas [✓] partnership     seri     profit
## 10   product      busi        fund      includ   servic     includ
##    1_climate 2_science 3_energy Other_1 Other_2 Other_3
## 1          5        46       10     114     105      27
## 2          6        45       11     113     106      28
## 3          4        44        8     112     104      29
## 4          3        43        9     111     103      26
## 5          2        47       12     110     102      32
## 6          7        48       13     115     107      33
## 7          1        49       14       7     101      34
## 8         93        50       15       6      75      30
## 9         92        51       77       5      70      35
## 10        91        52       16       4      74      36

This plot visualizes ranking of the dictionary words in the corpus.

## # A tibble: 21 x 5
## # Groups:   Topic [3]
##    Word     WordCount `Proportion(%)` Ranking Topic    
##    <chr>        <int>           <dbl>   <int> <fct>    
##  1 chang         4821           0.229       1 1_climate
##  2 polici        4267           0.203       2 1_climate
##  3 natur         3031           0.144       3 1_climate
##  4 environ        943           0.045       4 1_climate
##  5 interior       325           0.015       5 1_climate
##  6 climat         199           0.009       6 1_climate
##  7 pari            59           0.003       7 1_climate
##  8 green           55           0.003       8 1_climate
##  9 ghg             37           0.002       9 1_climate
## 10 certain       5088           0.242       1 2_science
## # … with 11 more rows

Now, the same exercise but with bigrams (phrases)

## Document-feature matrix of: 5 documents, 10 features (74.0% sparse) and 4 docvars.
##                                        features
## docs                                    inform_one one_bank bank_distribut
##   8001070_AIF_DollaramaInc_25042014.txt          0        0              0
##   8001071_AIF_DollaramaInc_24042015.txt          0        0              0
##   8001072_AIF_DollaramaInc_22042016.txt          0        0              0
##   8001073_AIF_DollaramaInc_17042017.txt          0        0              0
##   8001074_AIF_DollaramaInc_20042018.txt          0        0              0
##                                        features
## docs                                    distribut_notic notic_inform
##   8001070_AIF_DollaramaInc_25042014.txt               0            0
##   8001071_AIF_DollaramaInc_24042015.txt               0            0
##   8001072_AIF_DollaramaInc_22042016.txt               0            0
##   8001073_AIF_DollaramaInc_17042017.txt               0            0
##   8001074_AIF_DollaramaInc_20042018.txt               0            0
##                                        features
## docs                                    inform_inform inform_accompani
##   8001070_AIF_DollaramaInc_25042014.txt             2                0
##   8001071_AIF_DollaramaInc_24042015.txt             2                0
##   8001072_AIF_DollaramaInc_22042016.txt             2                0
##   8001073_AIF_DollaramaInc_17042017.txt             2                0
##   8001074_AIF_DollaramaInc_20042018.txt             2                0
##                                        features
## docs                                    accompani_copi inform_fiscal
##   8001070_AIF_DollaramaInc_25042014.txt              0             1
##   8001071_AIF_DollaramaInc_24042015.txt              0             1
##   8001072_AIF_DollaramaInc_22042016.txt              0             1
##   8001073_AIF_DollaramaInc_17042017.txt              0             1
##   8001074_AIF_DollaramaInc_20042018.txt              0             1
##                                        features
## docs                                    copi_document
##   8001070_AIF_DollaramaInc_25042014.txt             3
##   8001071_AIF_DollaramaInc_24042015.txt             3
##   8001072_AIF_DollaramaInc_22042016.txt             3
##   8001073_AIF_DollaramaInc_17042017.txt             0
##   8001074_AIF_DollaramaInc_20042018.txt             0

Bigram Frequencies

##                     feature frequency rank docfreq group
## 1           general_partner       179    1       2  2001
## 2                life_insur       155    2       1  2001
## 3           manulif_financi       147    3       1  2001
## 4                 crude_oil       142    4       3  2001
## 5                trust_unit       127    5       1  2001
## 6            pipelin_system       118    6       2  2001
## 7                unit_state        76    7       4  2001
## 8             insur_compani        71    8       2  2001
## 9                class_unit        67    9       1  2001
## 10           north_american        58   10       3  2001
## 11              vice_presid        58   10       4  2001
## 12            north_america        55   12       3  2001
## 13                 year_end        54   13       4  2001
## 14           board_director        54   13       4  2001
## 15    partnership_agreement        51   15       1  2001
## 16              mutual_fund        50   16       2  2001
## 17        financi_statement        49   17       4  2001
## 18                 tier_one        47   18       1  2001
## 19            insur_product        47   18       1  2001
## 20                hong_kong        46   20       1  2001
## 21              fiscal_year        44   21       3  2001
## 22              class_share        44   21       3  2001
## 23           distribut_cash        44   21       3  2001
## 24              third_parti        43   24       4  2001
## 25             design_engin        43   24       1  2001
## 26          manulif_centuri        42   26       1  2001
## 27             execut_offic        40   27       4  2001
## 28           class_subordin        40   27       1  2001
## 29            subordin_vote        40   27       1  2001
## 30             centuri_life        40   27       1  2001
## 31             manag_believ        39   31       3  2001
## 32             declar_trust        39   31       1  2001
## 33              segreg_fund        39   31       1  2001
## 34          manufactur_life        39   31       1  2001
## 35         british_columbia        38   35       3  2001
## 36              magna_steyr        38   35       1  2001
## 37          direct_indirect        37   37       4  2001
## 38               vote_share        37   37       3  2001
## 39           feeder_pipelin        37   37       2  2001
## 40                 see_item        37   37       1  2001
## 41            pipelin_asset        36   41       2  2001
## 42                net_incom        35   42       4  2001
## 43            individu_life        35   42       1  2001
## 44            limit_partner        33   44       1  2001
## 45             common_share        32   45       3  2001
## 46          product_develop        32   45       2  2001
## 47                ppc_share        32   45       1  2001
## 48       automot_manufactur        32   45       1  2001
## 49             chief_execut        31   49       4  2001
## 50             general_fund        31   49       1  2001
## 51               real_estat        30   51       2  2001
## 52                busi_unit        30   51       2  2001
## 53            group_pension        30   51       1  2001
## 54             presid_chief        29   54       4  2001
## 55             joint_ventur        29   54       3  2001
## 56                  one_two        29   54       1  2001
## 57            descript_busi        28   57       2  2001
## 58               fund_manag        28   57       2  2001
## 59         corpor_constitut        28   57       1  2001
## 60            stock_exchang        27   60       4  2001
## 61                 new_york        27   60       2  2001
## 62              life_health        27   60       1  2001
## 63              execut_vice        26   63       2  2001
## 64          manag_agreement        26   63       1  2001
## 65            item_descript        26   63       1  2001
## 66              two_automot        26   63       1  2001
## 67        limit_partnership        25   67       3  2001
## 68             koch_pipelin        25   67       1  2001
## 69             health_insur        24   69       1  2001
## 70                set_forth        24   69       4  2001
## 71              peac_system        24   69       1  2001
## 72              averag_toll        24   69       1  2001
## 73                bow_river        24   69       1  2001
## 74         insur_subsidiari        24   69       1  2001
## 75           financi_inform        23   75       4  2001
## 76                long_term        23   75       4  2001
## 77             insur_polici        23   75       3  2001
## 78         manufactur_facil        23   75       1  2001
## 79               oem_custom        23   75       1  2001
## 80          daihyaku_mutual        23   75       1  2001
## 81                five_year        22   81       4  2001
## 82                 one_copi        22   81       4  2001
## 83             pension_plan        22   81       3  2001
## 84                natur_gas        22   81       3  2001
## 85          variabl_annuiti        22   81       1  2001
## 86                time_time        21   86       4  2001
## 87         capit_expenditur        21   86       3  2001
## 88         consolid_financi        21   86       3  2001
## 89               wholli_own        21   86       3  2001
## 90             wealth_manag        21   86       1  2001
## 91           partner_affili        21   86       1  2001
## 92             holder_least        21   86       2  2001
## 93               insur_busi        21   86       1  2001
## 94            oper_structur        21   86       1  2001
## 95              seat_system        21   86       1  2001
## 96          pension_product        21   86       1  2001
## 97                 u._divis        21   86       1  2001
## 98          general_develop        20   98       4  2001
## 99            product_offer        20   98       2  2001
## 100          product_includ        20   98       2  2001
## 101            invest_manag        20   98       3  2001
## 102       regulatori_author        20   98       2  2001
## 103           outstand_unit        20   98       2  2001
## 104        consolid_automot        20   98       1  2001
## 105          individu_insur        20   98       1  2001
## 106               law_regul        19  106       4  2001
## 107               per_share        19  106       2  2001
## 108          western_system        19  106       1  2001
## 109              capac_bbls        19  106       2  2001
## 110            profit_share        19  106       3  2001
## 111        engin_manufactur        19  106       1  2001
## 112            modul_system        19  106       1  2001
## 113              group_life        19  106       1  2001
## 114          canadian_divis        19  106       1  2001
## 115            univers_life        19  106       1  2001
## 116          product_servic        18  116       2  2001
## 117               incom_tax        18  116       3  2001
## 118            one_supplier        18  116       1  2001
## 119               secur_law        18  116       3  2001
## 120             fund_invest        18  116       2  2001
## 121           america_europ        18  116       2  2001
## 122       distribut_channel        18  116       1  2001
## 123            develop_busi        17  123       3  2001
## 124            public_trade        17  123       2  2001
## 125            insur_market        17  123       1  2001
## 126            holder_class        17  123       2  2001
## 127        short_prospectus        17  123       4  2001
## 128                 tax_act        17  123       3  2001
## 129          cash_distribut        17  123       2  2001
## 130        mid_saskatchewan        17  123       1  2001
## 131           system_integr        17  123       2  2001
## 132              oper_group        17  123       2  2001
## 133         financi_reinsur        17  123       1  2001
## 134       properti_casualti        17  123       1  2001
## 135           manag_discuss        16  135       4  2001
## 136         discuss_analysi        16  135       4  2001
## 137          financi_servic        16  135       2  2001
## 138             new_product        16  135       3  2001
## 139        sharehold_equiti        16  135       2  2001
## 140             share_class        16  135       2  2001
## 141              vote_secur        16  135       2  2001
## 142            will_continu        16  135       4  2001
## 143            compani_oper        16  135       2  2001
## 144            oper_pipelin        16  135       2  2001
## 145            director_ppc        16  135       1  2001
## 146    saskatchewan_pipelin        16  135       1  2001
## 147     partner_partnership        16  135       1  2001
## 148           prior_thereto        16  135       2  2001
## 149            partner_will        16  135       1  2001
## 150          compon_assembl        16  135       1  2001
## 151            complet_seat        16  135       1  2001
## 152             non_automot        16  135       1  2001
## 153           broker_dealer        16  135       1  2001
## 154         premium_deposit        16  135       1  2001
## 155             state_insur        16  135       1  2001
## 156               u._dollar        15  156       2  2001
## 157          financi_condit        15  156       3  2001
## 158            manag_servic        15  156       3  2001
## 159              oper_incom        15  156       2  2001
## 160            market_share        15  156       2  2001
## 161               fund_fund        15  156       2  2001
## 162             nebc_system        15  156       1  2001
## 163             oil_condens        15  156       2  2001
## 164           meet_unithold        15  156       1  2001
## 165             system_mean        15  156       1  2001
## 166            invest_asset        15  156       1  2001
## 167        automot_industri        15  156       1  2001
## 168            prefer_share        14  168       3  2001
## 169            addit_inform        14  168       4  2001
## 170           materi_advers        14  168       3  2001
## 171              three_year        14  168       3  2001
## 172           toronto_stock        14  168       4  2001
## 173          own_subsidiari        14  168       3  2001
## 174         equiti_interest        14  168       2  2001
## 175            balanc_sheet        14  168       4  2001
## 176           insur_coverag        14  168       4  2001
## 177           princip_occup        14  168       4  2001
## 178              free_trade        14  168       1  2001
## 179               vote_meet        14  168       3  2001
## 180            pipelin_oper        14  168       2  2001
## 181            feder_system        14  168       1  2001
## 182          produc_shipper        14  168       2  2001
## 183               unit_issu        14  168       3  2001
## 184           product_volum        14  168       2  2001
## 185            person_proxi        14  168       2  2001
## 186              fund_asset        14  168       1  2001
## 187        financi_strength        14  168       1  2001
## 188      sharehold_dividend        14  168       1  2001
## 189           assembl_modul        14  168       1  2001
## 190           engin_assembl        14  168       1  2001
## 191            automot_oper        14  168       1  2001
## 192              magna_oper        14  168       1  2001
## 193            life_financi        14  168       1  2001
## 194           insur_annuiti        14  168       1  2001
## 195         annuiti_pension        14  168       1  2001
## 196          director_offic        13  196       4  2001
## 197           advers_effect        13  196       2  2001
## 198            item_general        13  196       1  2001
## 199            financi_year        13  196       4  2001
## 200         toronto_ontario        13  196       3  2001
## 201             price_trust        13  196       1  2001
## 202              sinc_prior        13  196       3  2001
## 203              carri_valu        13  196       1  2001
## 204           share_program        13  196       1  2001
## 205          pembina_system        13  196       1  2001
## 206                per_unit        13  196       2  2001
## 207            lake_pipelin        13  196       2  2001
## 208       system_throughput        13  196       2  2001
## 209             ppc_pipelin        13  196       1  2001
## 210          export_pipelin        13  196       2  2001
## 211               per_trust        13  196       1  2001
## 212             system_bbls        13  196       2  2001
## 213           river_pipelin        13  196       2  2001
## 214               non_resid        13  196       2  2001
## 215        partnership_will        13  196       1  2001
## 216           kilometr_mile        13  196       1  2001
## 217           mean_approxim        13  196       1  2001
## 218        approxim_pipelin        13  196       1  2001
## 219          least_outstand        13  196       1  2001
## 220        approxim_employe        13  196       2  2001
## 221         trade_agreement        13  196       1  2001
## 222          complet_vehicl        13  196       1  2001
## 223       american_european        13  196       1  2001
## 224         manufactur_oper        13  196       1  2001
## 225              non_tradit        13  196       1  2001
## 226            elliott_page        13  196       1  2001
## 227            reinsur_busi        13  196       1  2001
## 228        individu_annuiti        13  196       1  2001
## 229        commerci_mortgag        13  196       1  2001
## 230          bond_portfolio        13  196       1  2001
## 231       mortgag_portfolio        13  196       1  2001
## 232            insur_author        13  196       1  2001
## 233             among_thing        12  233       4  2001
## 234          incorpor_refer        12  233       4  2001
## 235             asset_manag        12  233       2  2001
## 236           weight_averag        12  233       2  2001
## 237           health_safeti        12  233       2  2001
## 238             facil_locat        12  233       2  2001
## 239             market_valu        12  233       3  2001
## 240           custom_servic        12  233       1  2001
## 241             provid_fund        12  233       2  2001
## 242       agreement_general        12  233       2  2001
## 243           approv_holder        12  233       2  2001
## 244         pembina_pipelin        12  233       1  2001
## 245     administr_agreement        12  233       1  2001
## 246 partnership_partnership        12  233       2  2001
## 247           approxim_bbls        12  233       2  2001
## 248            system_group        12  233       2  2001
## 249          revenu_pipelin        12  233       1  2001
## 250          pipelin_averag        12  233       2  2001
## 251        daili_throughput        12  233       2  2001
## 252               cold_lake        12  233       1  2001
## 253          transfer_class        12  233       1  2001
## 254             epl_pipelin        12  233       1  2001
## 255              own_direct        12  233       2  2001
## 256        signific_develop        12  233       1  2001
## 257             part_compon        12  233       1  2001
## 258            automot_sale        12  233       1  2001
## 259         exterior_system        12  233       1  2001
## 260            affin_market        12  233       1  2001
## 261              term_insur        12  233       1  2001
## 262     particip_policyhold        12  233       1  2001
## 263              insur_hold        12  233       1  2001
## 264         annuiti_product        12  233       1  2001
## 265               busi_oper        11  265       3  2001
## 266         canadian_dollar        11  265       4  2001
## 267            capit_requir        11  265       3  2001
## 268             busi_corpor        11  265       2  2001
## 269            public_offer        11  265       3  2001
## 270              tier_capit        11  265       1  2001
## 271          product_suppli        11  265       1  2001
## 272                 oil_gas        11  265       3  2001
## 273              incom_fund        11  265       2  2001
## 274            averag_daili        11  265       2  2001
## 275          consent_holder        11  265       2  2001
## 276              manag_oper        11  265       3  2001
## 277      safeti_environment        11  265       2  2001
## 278             oper_system        11  265       2  2001
## 279          system_consist        11  265       2  2001
## 280         northern_system        11  265       1  2001
## 281            bbls_compris        11  265       1  2001
## 282             light_sweet        11  265       2  2001
## 283            abandon_cost        11  265       2  2001
## 284             koch_valley        11  265       1  2001
## 285          valley_pipelin        11  265       1  2001
## 286           pipelin_limit        11  265       1  2001
## 287             heavi_blend        11  265       1  2001
## 288               asset_new        11  265       1  2001
## 289        throughput_capac        11  265       1  2001
## 290           trade_toronto        11  265       3  2001
## 291         automot_product        11  265       2  2001
## 292            system_deliv        11  265       2  2001
## 293       asset_partnership        11  265       1  2001
## 294             meet_person        11  265       2  2001
## 295         written_consent        11  265       1  2001
## 296           program_manag        11  265       1  2001
## 297            product_line        11  265       2  2001
## 298           mirror_system        11  265       1  2001
## 299              sport_util        11  265       1  2001
## 300         parti_administr        11  265       1  2001
## 301           group_benefit        11  265       1  2001
## 302        casualti_reinsur        11  265       1  2001
## 303             polici_loan        11  265       1  2001
## 304               term_life        11  265       1  2001
## 305        consolid_general        11  265       1  2001
## 306           human_resourc        10  306       3  2001
## 307             canadian_u.        10  306       2  2001
## 308          audit_committe        10  306       4  2001
## 309              share_note        10  306       1  2001
## 310          princip_amount        10  306       2  2001
## 311       independ_director        10  306       2  2001
## 312          recent_complet        10  306       3  2001
## 313            share_entitl        10  306       3  2001
## 314              oper_prior        10  306       2  2001
## 315         repres_approxim        10  306       1  2001
## 316                end_year        10  306       4  2001
## 317             senior_vice        10  306       2  2001
## 318       regulatori_requir        10  306       4  2001
## 319           advers_affect        10  306       3  2001
## 320             will_provid        10  306       4  2001
## 321              rate_agenc        10  306       1  2001
## 322             system_oper        10  306       3  2001
## 323            busi_develop        10  306       3  2001
## 324          chairman_chief        10  306       4  2001
## 325               unit_held        10  306       2  2001
## 326            asset_liabil        10  306       2  2001
## 327               fund_busi        10  306       2  2001
## 328           relat_pipelin        10  306       2  2001
## 329                non_oper        10  306       2  2001
## 330          wabasca_system        10  306       1  2001
## 331       fort_saskatchewan        10  306       1  2001
## 332           tran_mountain        10  306       2  2001
## 333               gas_plant        10  306       2  2001
## 334             deliv_crude        10  306       2  2001
## 335             sweet_crude        10  306       2  2001
## 336        busi_partnership        10  306       1  2001
## 337              light_sour        10  306       1  2001
## 338              sour_crude        10  306       1  2001
## 339               fund_will        10  306       2  2001
## 340             inch_diamet        10  306       1  2001
## 341             retir_incom        10  306       2  2001
## 342             addit_trust        10  306       1  2001
## 343            storag_capac        10  306       2  2001
## 344              incom_loss        10  306       2  2001
## 345            name_municip        10  306       4  2001
## 346           municip_resid        10  306       4  2001
## 347        govern_agreement        10  306       1  2001
## 348               new_asset        10  306       1  2001
## 349               unit_vote        10  306       1  2001
## 350           proxi_written        10  306       1  2001
## 351             unit_annual        10  306       2  2001
## 352             insur_regul        10  306       1  2001
## 353            magna_intern        10  306       1  2001
## 354             wheel_drive        10  306       1  2001
## 355           develop_engin        10  306       1  2001
## 356        compani_consolid        10  306       2  2001
## 357         magna_entertain        10  306       1  2001
## 358          entertain_corp        10  306       1  2001
## 359         vehicl_platform        10  306       1  2001
## 360        research_develop        10  306       1  2001
## 361           chang_compani        10  306       1  2001
## 362             util_vehicl        10  306       1  2001
## 363        steyr_powertrain        10  306       1  2001
## 364            prefer_secur        10  306       2  2001
## 365               share_mec        10  306       1  2001
## 366           austria_magna        10  306       1  2001
## 367            gmbh_germani        10  306       1  2001
## 368               busi_line        10  306       1  2001
## 369            accid_health        10  306       1  2001
## 370          health_reinsur        10  306       1  2001
## 371               cash_valu        10  306       1  2001
## 372            manulif_wood        10  306       1  2001
## 373              wood_logan        10  306       1  2001
## 374              agenc_forc        10  306       1  2001
## 375        insur_regulatori        10  306       1  2001
## 376           control_level        10  306       1  2001
## 377           factor_includ         9  377       4  2001
## 378        account_principl         9  377       3  2001
## 379             result_oper         9  377       4  2001
## 380        financi_institut         9  377       2  2001
## 381              corpor_act         9  377       2  2001
## 382                earn_per         9  377       2  2001
## 383          aggreg_princip         9  377       2  2001
## 384        agreement_provid         9  377       2  2001
## 385                one_vote         9  377       3  2001
## 386            benefici_own         9  377       3  2001
## 387            cash_redempt         9  377       1  2001
## 388              period_end         9  377       2  2001
## 389             also_provid         9  377       3  2001
## 390            financi_oper         9  377       2  2001
## 391              applic_law         9  377       4  2001
## 392              follow_set         9  377       4  2001
## 393        payment_dividend         9  377       2  2001
## 394            return_capit         9  377       3  2001
## 395             full_servic         9  377       1  2001
## 396        compens_committe         9  377       4  2001
## 397            product_sale         9  377       2  2001
## 398       preliminari_short         9  377       4  2001
## 399           general_manag         9  377       3  2001
## 400             recent_year         9  377       3  2001
## 401           system_includ         9  377       3  2001
## 402      manufactur_process         9  377       1  2001
## 403         agreement_manag         9  377       1  2001
## 404     partnership_general         9  377       2  2001
## 405              bonni_glen         9  377       1  2001
## 406             glen_system         9  377       1  2001
## 407               ppc_manag         9  377       1  2001
## 408               manag_ppc         9  377       1  2001
## 409        administr_expens         9  377       2  2001
## 410           gather_system         9  377       2  2001
## 411              bbls_crude         9  377       2  2001
## 412              oil_system         9  377       2  2001
## 413           miscibl_flood         9  377       1  2001
## 414        throughput_volum         9  377       2  2001
## 415                ppc_will         9  377       1  2001
## 416             storag_tank         9  377       2  2001
## 417         partner_general         9  377       1  2001
## 418               oil_light         9  377       1  2001
## 419     distribut_distribut         9  377       2  2001
## 420       general_administr         9  377       2  2001
## 421               inch_inch         9  377       1  2001
## 422           boost_station         9  377       1  2001
## 423             cash_reserv         9  377       1  2001
## 424            oper_consist         9  377       1  2001
## 425           subject_regul         9  377       1  2001
## 426            unit_kingdom         9  377       2  2001
## 427             manag_limit         9  377       1  2001
## 428           letter_intent         9  377       1  2001
## 429             item_corpor         9  377       1  2001
## 430          vehicl_product         9  377       1  2001
## 431           magna_automot         9  377       1  2001
## 432          approxim_magna         9  377       1  2001
## 433         servic_supplier         9  377       1  2001
## 434            european_oem         9  377       1  2001
## 435           compani_board         9  377       2  2001
## 436          automot_system         9  377       1  2001
## 437           vehicl_system         9  377       1  2001
## 438           innov_product         9  377       2  2001
## 439             engin_centr         9  377       1  2001
## 440         interior_system         9  377       1  2001
## 441             tesma_class         9  377       1  2001
## 442               mec_class         9  377       1  2001
## 443              tax_profit         9  377       1  2001
## 444         america_delawar         9  377       2  2001
## 445        control_interest         9  377       1  2001
## 446             insur_group         9  377       1  2001
## 447             non_control         9  377       1  2001
## 448               insur_law         9  377       1  2001
## 449         invest_platform         9  377       1  2001
## 450         individu_wealth         9  377       1  2001
## 451         appoint_actuari         9  377       1  2001
## 452        annuiti_contract         9  377       1  2001
## 453           common_invest         9  377       1  2001
## 454              whole_life         9  377       1  2001
## 455           manulif_north         9  377       1  2001
## 456          manulif_intern         9  377       1  2001
## 457             regul_insur         9  377       1  2001
## 458            fund_compani         9  377       1  2001
## 459           compani_ordin         9  377       1  2001
## 460           copi_document         8  460       3  2001
## 461            oper_financi         8  460       2  2001
## 462            market_secur         8  460       4  2001
## 463          offic_director         8  460       3  2001
## 464            includ_limit         8  460       4  2001
## 465             share_capit         8  460       3  2001
## 466            senior_manag         8  460       4  2001
## 467            initi_public         8  460       3  2001
## 468          control_direct         8  460       2  2001
## 469              oper_offic         8  460       2  2001
## 470        privat_placement         8  460       2  2001
## 471           develop_chang         8  460       1  2001
## 472             term_condit         8  460       3  2001
## 473                own_oper         8  460       3  2001
## 474          product_design         8  460       2  2001
## 475          fourth_quarter         8  460       2  2001
## 476          director_elect         8  460       2  2001
## 477         collect_bargain         8  460       1  2001
## 478           presid_financ         8  460       3  2001
## 479            matter_relat         8  460       3  2001
## 480         complet_financi         8  460       3  2001
## 481               oper_cash         8  460       2  2001
## 482     interest_subsidiari         8  460       1  2001
## 483                new_busi         8  460       2  2001
## 484          insur_industri         8  460       1  2001
## 485          presid_general         8  460       2  2001
## 486         ontario_ontario         8  460       1  2001
## 487           manag_product         8  460       1  2001
## 488           redempt_right         8  460       1  2001
## 489          product_provid         8  460       2  2001
## 490               will_also         8  460       4  2001
## 491           descript_fund         8  460       1  2001
## 492          system_pembina         8  460       1  2001
## 493                fund_ppc         8  460       1  2001
## 494          system_western         8  460       1  2001
## 495             bbls_averag         8  460       1  2001
## 496             oil_pipelin         8  460       2  2001
## 497               capit_ppc         8  460       1  2001
## 498               unit_fund         8  460       1  2001
## 499      distribut_unithold         8  460       2  2001
## 500           system_gather         8  460       2  2001
##                 feature frequency  rank docfreq group
## 92144      climat_chang         1  9075       1  2004
## 120572     climat_chang         3  3128       2  2005
## 164742     climat_chang         2  5466       2  2006
## 207205     climat_chang         2  5503       2  2007
## 256771     climat_chang         1 14348       1  2008
## 350232     climat_chang         4  3203       2  2010
## 406870     climat_chang         4  3089       2  2011
## 459634     climat_chang         5  1737       3  2012
## 504016     climat_chang         5  1709       3  2013
## 549185     climat_chang         6  1968       3  2014
## 608512     climat_chang         7  2862       3  2015
## 677044     climat_chang         8  1728       3  2016
## 741291     climat_chang         9  1326       4  2017
## 802963     climat_chang        18   544       6  2018
## 870875     climat_chang        28   263       6  2019
## 941053     climat_chang        43   139       6  2020
## 4             crude_oil       142     4       3  2001
## 24654         crude_oil       151     3       3  2002
## 55434         crude_oil       167     1       3  2003
## 82829         crude_oil       175     2       3  2004
## 117390        crude_oil       153     7       3  2005
## 159156        crude_oil       151     5       2  2006
## 201542        crude_oil       139     7       2  2007
## 242084        crude_oil        97    25       2  2008
## 292853        crude_oil        79    36       2  2009
## 346980        crude_oil        79    33       2  2010
## 403720        crude_oil        87    26       2  2011
## 457890        crude_oil        52    45       1  2012
## 502291        crude_oil        64    30       1  2013
## 547167        crude_oil       105    24       2  2014
## 605641        crude_oil       115    41       2  2015
## 675314        crude_oil       111    34       2  2016
## 739965        crude_oil       116    25       2  2017
## 802437        crude_oil       110    32       2  2018
## 870638        crude_oil       115    31       2  2019
## 940945        crude_oil       115    30       2  2020
## 360    research_develop        10   306       1  2001
## 25064  research_develop        10   346       1  2002
## 55741  research_develop        10   262       1  2003
## 83168  research_develop        12   300       1  2004
## 117776 research_develop        13   353       1  2005
## 159686 research_develop        11   485       1  2006
## 242768 research_develop        11   620       1  2008
## 293701 research_develop        11   768       1  2009
## 347683 research_develop        12   649       1  2010
## 404293 research_develop        13   537       1  2011
## 458551 research_develop        10   608       1  2012
## 503481 research_develop         7  1012       1  2013
## 547827 research_develop        13   602       1  2014
## 653585 research_develop         1 24060       1  2015
## 719875 research_develop         1 20389       1  2016
## 783541 research_develop         1 18977       1  2017
## 805568 research_develop         6  2592       1  2018
## 872353 research_develop         9  1535       1  2019
## 943006 research_develop         8  1817       1  2020
## 389924     exact_scienc         1 16015       1  2010
## 444678     exact_scienc         1 15465       1  2011
## 488479     exact_scienc         1 12091       1  2012
## 532668     exact_scienc         1 12092       1  2013
## 590610     exact_scienc         1 16953       1  2014
## 655317     exact_scienc         1 24060       1  2015
## 721498     exact_scienc         1 20389       1  2016
## 785453     exact_scienc         1 18977       1  2017
## 850936     exact_scienc         1 20852       1  2018
## 920197     exact_scienc         1 21455       1  2019
## 987402     exact_scienc         1 20911       1  2020

This plot demonstrates frequencies of the bigrams-of-interest

Basw KeyATM for bigrams

I created a dictionary for science skepticism (example).

Bigram_Dict_KeyATM <- list(
  climate = c("climat_chang", "global_climat" , "life_health"),
  science = c("research_develop", "exact_scienc"),
  energy = c("crude_oil", "pipelin_system", "feeder_pipelin", "natur_gas"))

The output demonstrates a base KeyATM model with extra 3 topics It also shows which topic is the most common in a documents

## Initializing the model...
## Fitting the model. 1500 iterations...
## Creating an output object. It may take time...
##            1_climate      2_science           3_energy             Other_1
## 1       prefer_share   prefer_share      inter_pipelin        common_share
## 2       first_prefer   class_prefer    general_partner         burger_king
## 3       execut_offic audit_committe pipelin_system [✓]        exchang_unit
## 4     audit_committe   common_share      crude_oil [✓] partnership_exchang
## 5         share_seri     seri_class      natur_gas [✓]        prefer_share
## 6  financi_statement    medium_term         trust_unit           fair_valu
## 7        nation_bank      term_note         class_unit           incom_tax
## 8        vice_presid extern_auditor           oil_sand        herein_refer
## 9          per_share     share_seri          cold_lake   financi_statement
## 10    extern_auditor      seri_seri          long_term     incorpor_herein
##             Other_2              Other_3
## 1       class_share         prefer_share
## 2        life_insur           vote_share
## 3   manulif_financi        subordin_vote
## 4     insur_compani automobil_manufactur
## 5       share_class             per_cent
## 6       mutual_fund        power_financi
## 7   manufactur_life        advers_effect
## 8      john_hancock        materi_advers
## 9        unit_state         first_prefer
## 10 independ_auditor       class_subordin
##    1_climate 2_science 3_energy Other_1 Other_2 Other_3
## 1         72        95       10     113      46      36
## 2         70        96       11     114      45      35
## 3         71        94        8     112      49      32
## 4         73        93       13     111      50      37
## 5         75        92       12     115      48      33
## 6         74        91       16     110      54      34
## 7         69        90       15       7      56      27
## 8         68        89       17       1      55      29
## 9         67        88        9       6      53      28
## 10       127        87       18       2      47      38