This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (randomly selected by me, n=132).
text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"),
encoding = "UTF-8",
docvarsfrom = "filenames",
docvarnames = c("num", "type", "company", "date"),
dvsep = "_")
This step is when we actually start reducing our corpus, excluding stopwords (exist in Quanteda Package) and words that we want in order to reduce the corpus to words that are meaningful to us. I am also removing punctuation, white space, numbers, urls, and other symbols. This step will also lower all letters.
Descriptive statistical analysis of words-of-interest
This step also stems words (shortening words to just their root forms) using Martin Porter’s stemming algorithm (included in Quanteda Package).
Frequencies grouped by year
## feature frequency rank docfreq group
## 1 system 581 1 4 2001
## 2 insur 554 2 4 2001
## 3 pipelin 521 3 2 2001
## 4 product 520 4 4 2001
## 5 oper 492 5 4 2001
## 6 manag 431 6 4 2001
## 7 fund 421 7 4 2001
## 8 unit 398 8 4 2001
## 9 magna 388 9 1 2001
## 10 financi 381 10 4 2001
## 11 compani 353 11 4 2001
## 12 general 336 12 4 2001
## 13 busi 321 13 4 2001
## 14 life 308 14 3 2001
## 15 includ 306 15 4 2001
## 16 market 301 16 4 2001
## 17 share 294 17 4 2001
## 18 ppc 280 18 1 2001
## 19 manulif 274 19 1 2001
## 20 invest 272 20 4 2001
## 21 partner 270 21 4 2001
## 22 provid 269 22 4 2001
## 23 will 260 23 4 2001
## 24 partnership 247 24 4 2001
## 25 asset 223 25 4 2001
## 26 year 221 26 4 2001
## 27 develop 219 27 4 2001
## 28 new 205 28 4 2001
## 29 manufactur 202 29 2 2001
## 30 oil 202 29 4 2001
## 31 class 198 31 4 2001
## 32 trust 197 32 3 2001
## 33 requir 195 33 4 2001
## 34 group 194 34 4 2001
## 35 distribut 192 35 4 2001
## 36 agreement 186 36 4 2001
## 37 custom 180 37 4 2001
## 38 automot 178 38 2 2001
## 39 limit 174 39 4 2001
## 40 director 172 40 4 2001
## 41 one 171 41 4 2001
## 42 servic 168 42 4 2001
## 43 also 167 43 4 2001
## 44 capit 166 44 4 2001
## 45 addit 166 44 4 2001
## 46 interest 150 46 4 2001
## 47 incom 150 46 4 2001
## 48 canadian 149 48 4 2001
## 49 crude 146 49 3 2001
## 50 approxim 142 50 4 2001
## 51 state 141 51 4 2001
## 52 vote 141 51 4 2001
## 53 facil 140 53 3 2001
## 54 sale 138 54 4 2001
## 55 cash 137 55 4 2001
## 56 vehicl 136 56 3 2001
## 57 oem 135 57 1 2001
## 58 secur 134 58 4 2001
## 59 offic 132 59 4 2001
## 60 cost 127 60 4 2001
## 61 term 124 61 4 2001
## 62 time 124 61 4 2001
## 63 north 124 61 4 2001
## 64 inform 123 64 4 2001
## 65 individu 123 64 3 2001
## 66 engin 121 66 2 2001
## 67 subsidiari 119 67 4 2001
## 68 increas 118 68 4 2001
## 69 end 116 69 4 2001
## 70 respect 116 69 4 2001
## 71 certain 114 71 4 2001
## 72 prior 113 72 4 2001
## 73 regul 112 73 4 2001
## 74 employe 111 74 4 2001
## 75 relat 111 74 4 2001
## 76 design 111 74 4 2001
## 77 two 110 77 4 2001
## 78 produc 108 78 4 2001
## 79 complet 107 79 4 2001
## 80 own 106 80 4 2001
## 81 u. 106 80 4 2001
## 82 supplier 105 82 1 2001
## 83 plan 104 83 4 2001
## 84 program 103 84 4 2001
## 85 ontario 102 85 3 2001
## 86 reinsur 102 85 1 2001
## 87 polici 101 87 4 2001
## 88 base 101 87 4 2001
## 89 rate 99 89 4 2001
## 90 divis 99 89 3 2001
## 91 mean 98 91 4 2001
## 92 meet 98 91 4 2001
## 93 offer 98 91 4 2001
## 94 presid 98 91 4 2001
## 95 unithold 98 91 2 2001
## 96 person 96 96 4 2001
## 97 pension 96 96 3 2001
## 98 law 95 98 4 2001
## 99 industri 94 99 4 2001
## 100 execut 94 99 4 2001
## 101 amount 94 99 4 2001
## 102 corpor 93 102 4 2001
## 103 price 93 102 4 2001
## 104 non 92 104 4 2001
## 105 direct 92 104 4 2001
## 106 consolid 92 104 4 2001
## 107 sharehold 92 104 4 2001
## 108 bbls 92 104 2 2001
## 109 result 91 109 4 2001
## 110 follow 90 110 4 2001
## 111 account 90 110 4 2001
## 112 intern 90 110 4 2001
## 113 assembl 90 110 1 2001
## 114 number 89 114 4 2001
## 115 administr 89 114 4 2001
## 116 continu 88 116 4 2001
## 117 princip 88 116 4 2001
## 118 annuiti 88 116 1 2001
## 119 control 87 119 4 2001
## 120 valu 87 119 4 2001
## 121 part 86 121 4 2001
## 122 subject 86 121 4 2001
## 123 holder 86 121 4 2001
## 124 approv 85 124 4 2001
## 125 mutual 85 124 2 2001
## 126 risk 84 126 4 2001
## 127 trade 84 126 4 2001
## 128 establish 84 126 4 2001
## 129 dividend 83 129 3 2001
## 130 issu 83 129 4 2001
## 131 averag 83 129 4 2001
## 132 net 83 129 4 2001
## 133 author 83 129 4 2001
## 134 america 82 134 3 2001
## 135 statement 81 135 4 2001
## 136 seat 81 135 1 2001
## 137 decoma 81 135 1 2001
## 138 well 80 138 4 2001
## 139 signific 80 138 4 2001
## 140 major 79 140 4 2001
## 141 tier 79 140 2 2001
## 142 period 78 142 4 2001
## 143 see 78 142 4 2001
## 144 note 77 144 4 2001
## 145 repres 77 144 4 2001
## 146 board 77 144 4 2001
## 147 compon 77 144 3 2001
## 148 koch 77 144 1 2001
## 149 mortgag 77 144 2 2001
## 150 hold 75 150 4 2001
## 151 throughput 74 151 2 2001
## 152 liabil 73 152 4 2001
## 153 transport 72 153 3 2001
## 154 tax 72 153 4 2001
## 155 applic 72 153 4 2001
## 156 right 71 156 4 2001
## 157 area 71 156 4 2001
## 158 portfolio 70 158 2 2001
## 159 competit 69 159 4 2001
## 160 regulatori 69 159 4 2001
## 161 use 69 159 4 2001
## 162 vice 69 159 4 2001
## 163 acquir 68 163 4 2001
## 164 current 67 164 4 2001
## 165 per 67 164 4 2001
## 166 american 67 164 4 2001
## 167 item 66 167 3 2001
## 168 acquisit 66 167 4 2001
## 169 technolog 65 169 4 2001
## 170 equiti 65 169 4 2001
## 171 calendar 64 171 3 2001
## 172 effect 64 171 4 2001
## 173 govern 64 171 4 2001
## 174 benefit 64 171 4 2001
## 175 report 63 175 4 2001
## 176 parti 63 175 4 2001
## 177 fiscal 62 177 3 2001
## 178 associ 62 177 4 2001
## 179 locat 62 177 4 2001
## 180 materi 61 180 4 2001
## 181 activ 61 180 4 2001
## 182 believ 61 180 4 2001
## 183 suppli 61 180 4 2001
## 184 properti 60 184 4 2001
## 185 volum 60 184 4 2001
## 186 futur 60 184 4 2001
## 187 chang 60 184 4 2001
## 188 quarter 60 184 4 2001
## 189 stock 60 184 4 2001
## 190 outstand 60 184 4 2001
## 191 sinc 60 184 4 2001
## 192 profit 60 184 4 2001
## 193 payment 60 184 4 2001
## 194 process 60 184 4 2001
## 195 subordin 60 184 3 2001
## 196 capac 59 196 4 2001
## 197 amend 59 196 4 2001
## 198 exchang 58 198 4 2001
## 199 date 58 198 4 2001
## 200 act 58 198 4 2001
## 201 day 58 198 4 2001
## 202 particip 58 198 4 2001
## 203 member 56 203 4 2001
## 204 avail 56 203 4 2001
## 205 chief 56 203 4 2001
## 206 purchas 56 203 4 2001
## 207 health 56 203 3 2001
## 208 recent 55 208 4 2001
## 209 refer 55 208 4 2001
## 210 revenu 55 208 4 2001
## 211 declar 55 208 3 2001
## 212 expens 55 208 4 2001
## 213 premium 55 208 1 2001
## 214 integr 55 208 4 2001
## 215 tesma 55 208 1 2001
## 216 contract 54 216 3 2001
## 217 third 54 216 4 2001
## 218 independ 54 216 4 2001
## 219 modul 54 216 1 2001
## 220 europ 54 216 2 2001
## 221 fee 53 221 4 2001
## 222 initi 53 221 4 2001
## 223 line 53 221 4 2001
## 224 held 53 221 4 2001
## 225 common 52 225 3 2001
## 226 transfer 52 225 4 2001
## 227 expect 52 225 4 2001
## 228 condit 52 225 4 2001
## 229 pursuant 52 225 4 2001
## 230 three 52 225 4 2001
## 231 reserv 52 225 3 2001
## 232 restrict 52 225 4 2001
## 233 steyr 52 225 1 2001
## 234 policyhold 52 225 1 2001
## 235 perform 51 235 4 2001
## 236 public 51 235 4 2001
## 237 deliv 51 235 4 2001
## 238 capabl 51 235 4 2001
## 239 gas 51 235 4 2001
## 240 accord 50 240 4 2001
## 241 exist 50 240 4 2001
## 242 enhanc 50 240 4 2001
## 243 support 50 240 4 2001
## 244 michigan 50 240 2 2001
## 245 strategi 49 245 3 2001
## 246 loss 49 245 4 2001
## 247 less 49 245 4 2001
## 248 truste 49 245 3 2001
## 249 level 48 249 4 2001
## 250 primarili 48 249 4 2001
## 251 paid 48 249 4 2001
## 252 month 48 249 4 2001
## 253 posit 47 253 4 2001
## 254 sell 47 253 4 2001
## 255 case 47 253 4 2001
## 256 five 46 256 4 2001
## 257 environment 46 256 4 2001
## 258 set 46 256 4 2001
## 259 aggreg 46 256 4 2001
## 260 power 46 256 4 2001
## 261 coverag 46 256 4 2001
## 262 western 46 256 3 2001
## 263 ltd 46 256 4 2001
## 264 mec 46 256 1 2001
## 265 hong 46 256 1 2001
## 266 kong 46 256 1 2001
## 267 transact 45 267 4 2001
## 268 made 45 267 4 2001
## 269 gather 45 267 2 2001
## 270 centr 45 267 3 2001
## 271 river 45 267 2 2001
## 272 segreg 45 267 2 2001
## 273 interior 45 267 1 2001
## 274 structur 44 274 4 2001
## 275 descript 44 274 4 2001
## 276 variabl 44 274 2 2001
## 277 consist 44 274 4 2001
## 278 feder 44 274 4 2001
## 279 affili 44 274 4 2001
## 280 propos 44 274 4 2001
## 281 growth 43 281 4 2001
## 282 purpos 43 281 4 2001
## 283 entitl 43 281 4 2001
## 284 condens 43 281 2 2001
## 285 data 42 285 4 2001
## 286 incorpor 42 285 4 2001
## 287 determin 42 285 4 2001
## 288 test 42 285 3 2001
## 289 centuri 42 285 1 2001
## 290 dollar 41 290 4 2001
## 291 constitut 41 290 4 2001
## 292 except 41 290 4 2001
## 293 various 41 290 4 2001
## 294 connect 41 290 4 2001
## 295 rang 41 290 4 2001
## 296 focus 41 290 3 2001
## 297 british 41 290 4 2001
## 298 european 41 290 1 2001
## 299 prefer 40 299 3 2001
## 300 order 40 299 4 2001
## 301 reason 40 299 4 2001
## 302 select 40 299 4 2001
## 303 return 40 299 4 2001
## 304 bank 40 299 4 2001
## 305 feeder 40 299 2 2001
## 306 iii 39 306 4 2001
## 307 indirect 39 306 4 2001
## 308 light 39 306 3 2001
## 309 maintain 39 306 4 2001
## 310 make 39 306 4 2001
## 311 sourc 39 306 4 2001
## 312 respons 39 306 4 2001
## 313 receiv 39 306 4 2001
## 314 columbia 39 306 3 2001
## 315 saskatchewan 39 306 3 2001
## 316 termin 39 306 4 2001
## 317 redempt 39 306 2 2001
## 318 plant 39 306 4 2001
## 319 ngls 39 306 1 2001
## 320 within 38 320 4 2001
## 321 basi 38 320 4 2001
## 322 reduc 38 320 4 2001
## 323 particular 38 320 4 2001
## 324 septemb 37 324 4 2001
## 325 ventur 37 324 3 2001
## 326 substanti 37 324 4 2001
## 327 real 37 324 4 2001
## 328 carri 37 324 4 2001
## 329 conduct 37 324 4 2001
## 330 exterior 37 324 1 2001
## 331 austria 37 324 1 2001
## 332 compris 36 332 4 2001
## 333 name 36 332 4 2001
## 334 global 36 332 2 2001
## 335 ownership 36 332 4 2001
## 336 bond 36 332 2 2001
## 337 claim 35 337 4 2001
## 338 contain 35 337 4 2001
## 339 oblig 35 337 4 2001
## 340 tool 35 337 3 2001
## 341 estat 35 337 2 2001
## 342 depend 35 337 4 2001
## 343 union 35 337 3 2001
## 344 util 35 337 4 2001
## 345 permit 35 337 4 2001
## 346 least 35 337 4 2001
## 347 chairman 35 337 4 2001
## 348 serv 35 337 4 2001
## 349 innov 35 337 2 2001
## 350 pembina 35 337 1 2001
## 351 factor 34 351 4 2001
## 352 natur 34 351 3 2001
## 353 abil 34 351 4 2001
## 354 combin 34 351 4 2001
## 355 togeth 34 351 4 2001
## 356 lead 34 351 3 2001
## 357 storag 34 351 3 2001
## 358 equal 34 351 3 2001
## 359 provis 34 351 4 2001
## 360 joint 34 351 4 2001
## 361 committe 33 361 4 2001
## 362 qualiti 33 361 4 2001
## 363 toronto 33 361 4 2001
## 364 expand 33 361 4 2001
## 365 bodi 33 361 2 2001
## 366 loan 33 361 4 2001
## 367 shipper 33 361 2 2001
## 368 japan 33 361 2 2001
## 369 event 32 369 4 2001
## 370 among 32 369 4 2001
## 371 regist 32 369 4 2001
## 372 appoint 32 369 3 2001
## 373 june 32 369 4 2001
## 374 option 32 369 4 2001
## 375 sold 32 369 4 2001
## 376 specif 32 369 4 2001
## 377 employ 32 369 4 2001
## 378 take 32 369 4 2001
## 379 sever 32 369 4 2001
## 380 calgari 32 369 3 2001
## 381 toll 32 369 2 2001
## 382 germani 32 369 1 2001
## 383 resourc 31 383 4 2001
## 384 consid 31 383 4 2001
## 385 standard 31 383 4 2001
## 386 mid 31 383 3 2001
## 387 success 31 383 3 2001
## 388 truck 31 383 3 2001
## 389 tariff 31 383 2 2001
## 390 pay 31 383 3 2001
## 391 payabl 31 383 4 2001
## 392 deposit 31 383 1 2001
## 393 door 31 383 1 2001
## 394 daimlerchrysl 31 383 1 2001
## 395 senior 30 395 4 2001
## 396 action 30 395 4 2001
## 397 indic 30 395 4 2001
## 398 protect 30 395 4 2001
## 399 debt 30 395 4 2001
## 400 similar 30 395 4 2001
## 401 investor 30 395 4 2001
## 402 work 30 395 4 2001
## 403 involv 30 395 4 2001
## 404 need 30 395 3 2001
## 405 allow 30 395 4 2001
## 406 long 30 395 4 2001
## 407 adjust 30 395 4 2001
## 408 agenc 30 395 1 2001
## 409 incur 30 395 3 2001
## 410 review 30 395 3 2001
## 411 asia 30 395 2 2001
## 412 broker 30 395 1 2001
## 413 york 30 395 2 2001
## 414 edmonton 30 395 2 2001
## 415 inch 30 395 1 2001
## 416 surplus 30 395 2 2001
## 417 financ 29 417 4 2001
## 418 remain 29 417 4 2001
## 419 larg 29 417 4 2001
## 420 earn 29 417 4 2001
## 421 improv 29 417 3 2001
## 422 construct 29 417 4 2001
## 423 function 29 417 4 2001
## 424 aris 29 417 4 2001
## 425 excess 29 417 4 2001
## 426 delawar 29 417 2 2001
## 427 relationship 28 427 3 2001
## 428 matter 28 427 4 2001
## 429 list 28 427 4 2001
## 430 small 28 427 4 2001
## 431 resid 28 427 4 2001
## 432 largest 28 427 4 2001
## 433 domest 28 427 3 2001
## 434 four 28 427 4 2001
## 435 special 28 427 4 2001
## 436 compens 28 427 4 2001
## 437 copi 28 427 4 2001
## 438 exceed 28 427 4 2001
## 439 channel 28 427 1 2001
## 440 trim 28 427 1 2001
## 441 ford 28 427 1 2001
## 442 agent 27 442 3 2001
## 443 tradit 27 442 3 2001
## 444 advers 27 442 4 2001
## 445 present 27 442 4 2001
## 446 past 27 442 4 2001
## 447 record 27 442 4 2001
## 448 enter 27 442 4 2001
## 449 becom 27 442 4 2001
## 450 segment 27 442 4 2001
## 451 high 27 442 4 2001
## 452 retir 27 442 3 2001
## 453 written 27 442 3 2001
## 454 document 27 442 4 2001
## 455 panel 27 442 1 2001
## 456 abandon 27 442 2 2001
## 457 audit 26 457 4 2001
## 458 discuss 26 457 4 2001
## 459 generat 26 457 4 2001
## 460 due 26 457 4 2001
## 461 type 26 457 3 2001
## 462 opportun 26 457 4 2001
## 463 either 26 457 4 2001
## 464 full 26 457 4 2001
## 465 balanc 26 457 4 2001
## 466 minimum 26 457 3 2001
## 467 deliveri 26 457 3 2001
## 468 save 26 457 3 2001
## 469 file 26 457 4 2001
## 470 separ 26 457 4 2001
## 471 univers 26 457 2 2001
## 472 ia 26 457 1 2001
## 473 peac 26 457 1 2001
## 474 qualifi 25 474 4 2001
## 475 defin 25 474 4 2001
## 476 request 25 474 4 2001
## 477 institut 25 474 2 2001
## 478 annual 25 474 4 2001
## 479 commerci 25 474 1 2001
## 480 forc 25 474 4 2001
## 481 safeti 25 474 3 2001
## 482 elect 25 474 4 2001
## 483 dealer 25 474 4 2001
## 484 duti 25 474 4 2001
## 485 bbl 25 474 2 2001
## 486 complianc 24 486 4 2001
## 487 assess 24 486 3 2001
## 488 expenditur 24 486 3 2001
## 489 form 24 486 4 2001
## 490 competitor 24 486 3 2001
## 491 platform 24 486 2 2001
## 492 guarante 24 486 2 2001
## 493 collect 24 486 4 2001
## 494 portion 24 486 4 2001
## 495 region 24 486 3 2001
## 496 forth 24 486 4 2001
## 497 bow 24 486 1 2001
## 498 mexico 24 486 1 2001
## 499 cosma 24 486 1 2001
## 500 mfc 24 486 1 2001
## feature frequency rank docfreq group
## 126 risk 84 126 4 2001
## 3479 risk 98 115 5 2002
## 6969 risk 86 105 4 2003
## 10272 risk 129 95 5 2004
## 13898 risk 186 75 5 2005
## 17810 risk 266 43 5 2006
## 21747 risk 302 35 5 2007
## 25387 risk 379 32 7 2008
## 29894 risk 541 18 8 2009
## 34389 risk 514 21 7 2010
## 38939 risk 508 19 7 2011
## 43399 risk 484 16 6 2012
## 47576 risk 467 15 6 2013
## 51750 risk 605 16 8 2014
## 56622 risk 664 36 8 2015
## 62031 risk 536 30 8 2016
## 66974 risk 524 24 9 2017
## 72139 risk 636 19 9 2018
## 77507 risk 666 19 9 2019
## 82982 risk 649 15 7 2020
## 12336 climat 2 2034 1 2004
## 15645 climat 4 1739 2 2005
## 19797 climat 3 1949 2 2006
## 23958 climat 2 2152 2 2007
## 28685 climat 1 3200 1 2008
## 36508 climat 4 2077 2 2010
## 41005 climat 4 2031 2 2011
## 45113 climat 5 1672 3 2012
## 49260 climat 5 1643 3 2013
## 53615 climat 6 1829 3 2014
## 58441 climat 9 1809 3 2015
## 63542 climat 10 1508 3 2016
## 68282 climat 13 1299 4 2017
## 73063 climat 29 924 6 2018
## 78288 climat 37 792 6 2019
## 83517 climat 65 547 6 2020
## 12446 carbon 2 2034 1 2004
## 15497 carbon 5 1565 2 2005
## 19451 carbon 5 1584 2 2006
## 22995 carbon 8 1241 2 2007
## 27436 carbon 4 1986 2 2008
## 32689 carbon 2 2646 1 2009
## 36066 carbon 7 1626 3 2010
## 40867 carbon 5 1838 3 2011
## 44997 carbon 6 1525 3 2012
## 48930 carbon 8 1311 3 2013
## 53125 carbon 11 1345 4 2014
## 58818 carbon 6 2161 3 2015
## 63435 carbon 12 1385 3 2016
## 68143 carbon 16 1165 4 2017
## 73218 carbon 22 1078 5 2018
## 78636 carbon 21 1133 5 2019
## 83880 carbon 33 896 4 2020
## 2484 scienc 1 2349 1 2001
## 16164 scienc 2 2264 1 2005
## 19569 scienc 4 1751 2 2006
## 22915 scienc 9 1179 3 2007
## 26562 scienc 12 1187 5 2008
## 31527 scienc 7 1622 4 2009
## 35656 scienc 12 1267 5 2010
## 40133 scienc 13 1195 6 2011
## 44744 scienc 8 1328 4 2012
## 48770 scienc 10 1187 4 2013
## 52849 scienc 17 1101 7 2014
## 57712 scienc 24 1117 6 2015
## 63084 scienc 21 1071 6 2016
## 68049 scienc 19 1083 6 2017
## 73381 scienc 16 1253 7 2018
## 78657 scienc 20 1152 7 2019
## 84319 scienc 16 1332 5 2020
## 1635 pari 3 1576 1 2001
## 5104 pari 3 1668 1 2002
## 8038 pari 6 1129 1 2003
## 11962 pari 3 1735 2 2004
## 16693 pari 1 2775 1 2005
## 20649 pari 1 2773 1 2006
## 27382 pari 4 1986 1 2008
## 31790 pari 5 1878 2 2009
## 36486 pari 4 2077 2 2010
## 41602 pari 2 2620 1 2011
## 45855 pari 2 2424 1 2012
## 50036 pari 2 2411 1 2013
## 54217 pari 3 2411 2 2014
## 59173 pari 4 2544 3 2015
## 64601 pari 3 2544 3 2016
## 69521 pari 3 2513 3 2017
## 74840 pari 3 2666 3 2018
## 80260 pari 3 2721 3 2019
## 85496 pari 4 2491 3 2020
## 615 environ 18 611 3 2001
## 3998 environ 20 633 4 2002
## 7490 environ 17 624 3 2003
## 10823 environ 21 643 4 2004
## 14464 environ 27 639 5 2005
## 18329 environ 33 560 5 2006
## 22209 environ 36 495 5 2007
## 25815 environ 50 460 7 2008
## 30268 environ 69 389 7 2009
## 34750 environ 71 379 6 2010
## 39272 environ 73 352 6 2011
## 43862 environ 41 477 5 2012
## 48011 environ 44 450 5 2013
## 52165 environ 64 431 7 2014
## 57230 environ 61 644 7 2015
## 62583 environ 53 581 7 2016
## 67507 environ 51 557 7 2017
## 72601 environ 71 479 8 2018
## 78007 environ 66 519 7 2019
## 83571 environ 57 604 5 2020
## 12768 green 1 2537 1 2004
## 24360 green 1 2593 1 2007
## 44824 green 7 1424 2 2012
## 50590 green 1 2984 1 2013
## 55268 green 1 3447 1 2014
## 58644 green 7 2038 4 2015
## 63939 green 6 1913 3 2016
## 68695 green 7 1729 3 2017
## 73716 green 10 1579 5 2018
## 79309 green 8 1798 4 2019
## 85099 green 6 2116 3 2020
## 26686 ghg 10 1300 1 2008
## 46495 ghg 1 2974 1 2012
## 54697 ghg 2 2808 1 2014
## 70006 ghg 2 2900 1 2017
## 74160 ghg 6 1993 1 2018
## 79147 ghg 10 1619 3 2019
## 85137 ghg 6 2116 3 2020
## 532 energi 22 520 3 2001
## 3840 energi 29 467 4 2002
## 7665 energi 12 783 3 2003
## 10717 energi 27 527 4 2004
## 14191 energi 53 365 4 2005
## 17997 energi 83 228 4 2006
## 21873 energi 110 161 3 2007
## 25532 energi 131 174 6 2008
## 30138 energi 100 261 7 2009
## 34543 energi 150 174 6 2010
## 39117 energi 127 196 6 2011
## 43719 energi 60 332 6 2012
## 47889 energi 62 324 6 2013
## 51982 energi 113 246 8 2014
## 56979 energi 111 388 8 2015
## 62324 energi 110 323 7 2016
## 67238 energi 115 287 7 2017
## 72335 energi 166 215 7 2018
## 77681 energi 184 191 8 2019
## 83138 energi 217 170 6 2020
## 606 research 19 583 2 2001
## 3993 research 21 612 3 2002
## 7508 research 17 624 2 2003
## 10847 research 21 643 3 2004
## 14478 research 27 639 5 2005
## 18504 research 24 717 5 2006
## 22828 research 11 1068 4 2007
## 26113 research 27 738 6 2008
## 30743 research 25 847 4 2009
## 35242 research 25 852 4 2010
## 39815 research 23 872 4 2011
## 44216 research 20 810 4 2012
## 48500 research 16 905 5 2013
## 52636 research 25 888 5 2014
## 58111 research 14 1498 5 2015
## 63489 research 11 1453 5 2016
## 68469 research 10 1468 5 2017
## 73499 research 14 1340 5 2018
## 78754 research 17 1245 5 2019
## 84382 research 15 1380 4 2020
This plot demonstrates frequencies of the words-of-interest including “Risk”
This plot shows the same data but excluding “Risk” because it supresses other words by the volume.
This step transforms the corpus of words into matrices called: DTM for Document-Term-Matrix (Quanteda) DFM for Document-Feature-Matrix (Dplyr) When creating a matrix, we can choose whether we want to continue working with one-word-per-column matrix (unigrams) or phrases with two words (bigrams) ot three words (trigrams).
For the demonstration purpose, I chose unigrams and bigrams. At the same time this step trims the matrix to minimum number of frequencies per term = 2 and per document = 2. This means that each term has to repeat twice within a document and be present at least in 2 documents. This reduces the corpus to make it more managable.
## Document-feature matrix of: 5 documents, 8,393 features (77.8% sparse) and 4 docvars.
## features
## docs inform fiscal year end content
## 8001070_AIF_DollaramaInc_25042014.txt 75 59 99 49 1
## 8001071_AIF_DollaramaInc_24042015.txt 78 63 101 46 1
## 8001072_AIF_DollaramaInc_22042016.txt 127 62 107 49 1
## 8001073_AIF_DollaramaInc_17042017.txt 115 66 89 28 1
## 8001074_AIF_DollaramaInc_20042018.txt 111 70 71 15 1
## features
## docs explanatori note forward look statement
## 8001070_AIF_DollaramaInc_25042014.txt 2 48 13 13 39
## 8001071_AIF_DollaramaInc_24042015.txt 2 101 14 14 41
## 8001072_AIF_DollaramaInc_22042016.txt 2 87 13 13 39
## 8001073_AIF_DollaramaInc_17042017.txt 2 87 14 13 36
## 8001074_AIF_DollaramaInc_20042018.txt 2 110 15 12 35
## [ reached max_nfeat ... 8,383 more features ]
Because the method is probabilistic, we need to set seed for “replicability” of results.
set.seed(250)
TOPIC MODELLING
KeyATM
I created a dictionary for science skepticism (example).
Scien_Dict_KeyATM <- list(
climate = c("climat", "green", "ghg", "environ", "pari", "polici", "chang", "interior", "natur", "emis"),
science = c("scienc", "research", "certain", "pursuant", "fact"),
energy = c("oil", "energi", "pipelin", "gas", "vehicl", "crude", "reserv"))
The output demonstrates a base KeyATM model with extra 3 topics It also shows which topic is the most common in a documents
## Initializing the model...
## Warning in check_keywords(info$wd_names, keywords, options$prune): A keyword
## will be pruned because it does not appear in documents: emis
## Fitting the model. 1500 iterations...
## Creating an output object. It may take time...
## 1_climate 2_science 3_energy Other_1 Other_2 Other_3
## 1 rate insur pipelin [✓] share bank product
## 2 share compani pembina restaur share manufactur
## 3 chang [✓] life system tax committe system
## 4 risk financi oil [✓] agreement financi magna
## 5 busi product unit net audit vehicl [3]
## 6 seri manag oper exchang prefer automot
## 7 corpor invest partnership incom rate oper
## 8 financi market general oper power share
## 9 credit fund gas [✓] partnership seri profit
## 10 product busi fund includ servic includ
## 1_climate 2_science 3_energy Other_1 Other_2 Other_3
## 1 5 46 10 114 105 27
## 2 6 45 11 113 106 28
## 3 4 44 8 112 104 29
## 4 3 43 9 111 103 26
## 5 2 47 12 110 102 32
## 6 7 48 13 115 107 33
## 7 1 49 14 7 101 34
## 8 93 50 15 6 75 30
## 9 92 51 77 5 70 35
## 10 91 52 16 4 74 36
This plot visualizes ranking of the dictionary words in the corpus.
## Warning in check_keywords(unique(unlisted), keywords, prune): A keyword will be
## pruned because it does not appear in documents: emis
## # A tibble: 21 x 5
## # Groups: Topic [3]
## Word WordCount `Proportion(%)` Ranking Topic
## <chr> <int> <dbl> <int> <fct>
## 1 chang 4821 0.229 1 1_climate
## 2 polici 4267 0.203 2 1_climate
## 3 natur 3031 0.144 3 1_climate
## 4 environ 943 0.045 4 1_climate
## 5 interior 325 0.015 5 1_climate
## 6 climat 199 0.009 6 1_climate
## 7 pari 59 0.003 7 1_climate
## 8 green 55 0.003 8 1_climate
## 9 ghg 37 0.002 9 1_climate
## 10 certain 5088 0.242 1 2_science
## # … with 11 more rows
Now, the same exercise but with bigrams (phrases)
## Document-feature matrix of: 5 documents, 10 features (12.0% sparse) and 4 docvars.
## features
## docs inform_fiscal fiscal_year year_end
## 8001070_AIF_DollaramaInc_25042014.txt 1 53 42
## 8001071_AIF_DollaramaInc_24042015.txt 1 51 40
## 8001072_AIF_DollaramaInc_22042016.txt 1 53 35
## 8001073_AIF_DollaramaInc_17042017.txt 1 32 19
## 8001074_AIF_DollaramaInc_20042018.txt 1 20 7
## features
## docs end_content content_explanatori
## 8001070_AIF_DollaramaInc_25042014.txt 1 1
## 8001071_AIF_DollaramaInc_24042015.txt 1 1
## 8001072_AIF_DollaramaInc_22042016.txt 1 1
## 8001073_AIF_DollaramaInc_17042017.txt 1 1
## 8001074_AIF_DollaramaInc_20042018.txt 1 1
## features
## docs explanatori_note note_forward
## 8001070_AIF_DollaramaInc_25042014.txt 2 1
## 8001071_AIF_DollaramaInc_24042015.txt 2 1
## 8001072_AIF_DollaramaInc_22042016.txt 2 0
## 8001073_AIF_DollaramaInc_17042017.txt 2 0
## 8001074_AIF_DollaramaInc_20042018.txt 2 0
## features
## docs forward_look look_statement
## 8001070_AIF_DollaramaInc_25042014.txt 12 12
## 8001071_AIF_DollaramaInc_24042015.txt 13 13
## 8001072_AIF_DollaramaInc_22042016.txt 12 12
## 8001073_AIF_DollaramaInc_17042017.txt 12 11
## 8001074_AIF_DollaramaInc_20042018.txt 12 11
## features
## docs statement_gaap
## 8001070_AIF_DollaramaInc_25042014.txt 2
## 8001071_AIF_DollaramaInc_24042015.txt 2
## 8001072_AIF_DollaramaInc_22042016.txt 0
## 8001073_AIF_DollaramaInc_17042017.txt 0
## 8001074_AIF_DollaramaInc_20042018.txt 0
Bigram Frequencies
## feature frequency rank docfreq group
## 1 general_partner 179 1 2 2001
## 2 life_insur 155 2 1 2001
## 3 manulif_financi 147 3 1 2001
## 4 crude_oil 142 4 3 2001
## 5 trust_unit 127 5 1 2001
## 6 pipelin_system 118 6 2 2001
## 7 unit_state 76 7 4 2001
## 8 insur_compani 71 8 2 2001
## 9 class_unit 67 9 1 2001
## 10 north_american 58 10 3 2001
## 11 vice_presid 58 10 4 2001
## 12 north_america 55 12 3 2001
## 13 year_end 54 13 4 2001
## 14 board_director 54 13 4 2001
## 15 partnership_agreement 51 15 1 2001
## 16 mutual_fund 50 16 2 2001
## 17 financi_statement 49 17 4 2001
## 18 tier_one 47 18 1 2001
## 19 insur_product 47 18 1 2001
## 20 hong_kong 46 20 1 2001
## 21 fiscal_year 44 21 3 2001
## 22 class_share 44 21 3 2001
## 23 distribut_cash 44 21 3 2001
## 24 third_parti 43 24 4 2001
## 25 design_engin 43 24 1 2001
## 26 manulif_centuri 42 26 1 2001
## 27 execut_offic 40 27 4 2001
## 28 class_subordin 40 27 1 2001
## 29 subordin_vote 40 27 1 2001
## 30 centuri_life 40 27 1 2001
## 31 manag_believ 39 31 3 2001
## 32 declar_trust 39 31 1 2001
## 33 segreg_fund 39 31 1 2001
## 34 manufactur_life 39 31 1 2001
## 35 british_columbia 38 35 3 2001
## 36 magna_steyr 38 35 1 2001
## 37 direct_indirect 37 37 4 2001
## 38 vote_share 37 37 3 2001
## 39 feeder_pipelin 37 37 2 2001
## 40 see_item 37 37 1 2001
## 41 pipelin_asset 36 41 2 2001
## 42 net_incom 35 42 4 2001
## 43 individu_life 35 42 1 2001
## 44 limit_partner 33 44 1 2001
## 45 common_share 32 45 3 2001
## 46 product_develop 32 45 2 2001
## 47 ppc_share 32 45 1 2001
## 48 automot_manufactur 32 45 1 2001
## 49 chief_execut 31 49 4 2001
## 50 general_fund 31 49 1 2001
## 51 real_estat 30 51 2 2001
## 52 busi_unit 30 51 2 2001
## 53 group_pension 30 51 1 2001
## 54 presid_chief 29 54 4 2001
## 55 joint_ventur 29 54 3 2001
## 56 one_two 29 54 1 2001
## 57 descript_busi 28 57 2 2001
## 58 fund_manag 28 57 2 2001
## 59 corpor_constitut 28 57 1 2001
## 60 stock_exchang 27 60 4 2001
## 61 new_york 27 60 2 2001
## 62 life_health 27 60 1 2001
## 63 execut_vice 26 63 2 2001
## 64 manag_agreement 26 63 1 2001
## 65 item_descript 26 63 1 2001
## 66 two_automot 26 63 1 2001
## 67 limit_partnership 25 67 3 2001
## 68 koch_pipelin 25 67 1 2001
## 69 health_insur 24 69 1 2001
## 70 set_forth 24 69 4 2001
## 71 peac_system 24 69 1 2001
## 72 averag_toll 24 69 1 2001
## 73 bow_river 24 69 1 2001
## 74 insur_subsidiari 24 69 1 2001
## 75 financi_inform 23 75 4 2001
## 76 long_term 23 75 4 2001
## 77 insur_polici 23 75 3 2001
## 78 manufactur_facil 23 75 1 2001
## 79 oem_custom 23 75 1 2001
## 80 daihyaku_mutual 23 75 1 2001
## 81 five_year 22 81 4 2001
## 82 one_copi 22 81 4 2001
## 83 pension_plan 22 81 3 2001
## 84 natur_gas 22 81 3 2001
## 85 variabl_annuiti 22 81 1 2001
## 86 time_time 21 86 4 2001
## 87 wholli_own 21 86 3 2001
## 88 wealth_manag 21 86 1 2001
## 89 capit_expenditur 21 86 3 2001
## 90 consolid_financi 21 86 3 2001
## 91 partner_affili 21 86 1 2001
## 92 holder_least 21 86 2 2001
## 93 insur_busi 21 86 1 2001
## 94 oper_structur 21 86 1 2001
## 95 seat_system 21 86 1 2001
## 96 pension_product 21 86 1 2001
## 97 u._divis 21 86 1 2001
## 98 general_develop 20 98 4 2001
## 99 product_includ 20 98 2 2001
## 100 product_offer 20 98 2 2001
## 101 invest_manag 20 98 3 2001
## 102 regulatori_author 20 98 2 2001
## 103 outstand_unit 20 98 2 2001
## 104 consolid_automot 20 98 1 2001
## 105 individu_insur 20 98 1 2001
## 106 law_regul 19 106 4 2001
## 107 per_share 19 106 2 2001
## 108 western_system 19 106 1 2001
## 109 capac_bbls 19 106 2 2001
## 110 profit_share 19 106 3 2001
## 111 engin_manufactur 19 106 1 2001
## 112 modul_system 19 106 1 2001
## 113 group_life 19 106 1 2001
## 114 canadian_divis 19 106 1 2001
## 115 univers_life 19 106 1 2001
## 116 product_servic 18 116 2 2001
## 117 incom_tax 18 116 3 2001
## 118 one_supplier 18 116 1 2001
## 119 secur_law 18 116 3 2001
## 120 fund_invest 18 116 2 2001
## 121 america_europ 18 116 2 2001
## 122 distribut_channel 18 116 1 2001
## 123 develop_busi 17 123 3 2001
## 124 public_trade 17 123 2 2001
## 125 insur_market 17 123 1 2001
## 126 holder_class 17 123 2 2001
## 127 short_prospectus 17 123 4 2001
## 128 tax_act 17 123 3 2001
## 129 cash_distribut 17 123 2 2001
## 130 mid_saskatchewan 17 123 1 2001
## 131 system_integr 17 123 2 2001
## 132 oper_group 17 123 2 2001
## 133 financi_reinsur 17 123 1 2001
## 134 properti_casualti 17 123 1 2001
## 135 manag_discuss 16 135 4 2001
## 136 discuss_analysi 16 135 4 2001
## 137 financi_servic 16 135 2 2001
## 138 new_product 16 135 3 2001
## 139 sharehold_equiti 16 135 2 2001
## 140 share_class 16 135 2 2001
## 141 vote_secur 16 135 2 2001
## 142 will_continu 16 135 4 2001
## 143 compani_oper 16 135 2 2001
## 144 oper_pipelin 16 135 2 2001
## 145 director_ppc 16 135 1 2001
## 146 prior_thereto 16 135 2 2001
## 147 saskatchewan_pipelin 16 135 1 2001
## 148 partner_partnership 16 135 1 2001
## 149 partner_will 16 135 1 2001
## 150 compon_assembl 16 135 1 2001
## 151 complet_seat 16 135 1 2001
## 152 non_automot 16 135 1 2001
## 153 broker_dealer 16 135 1 2001
## 154 premium_deposit 16 135 1 2001
## 155 state_insur 16 135 1 2001
## 156 u._dollar 15 156 2 2001
## 157 financi_condit 15 156 3 2001
## 158 manag_servic 15 156 3 2001
## 159 oper_incom 15 156 2 2001
## 160 market_share 15 156 2 2001
## 161 fund_fund 15 156 2 2001
## 162 nebc_system 15 156 1 2001
## 163 oil_condens 15 156 2 2001
## 164 meet_unithold 15 156 1 2001
## 165 system_mean 15 156 1 2001
## 166 invest_asset 15 156 1 2001
## 167 automot_industri 15 156 1 2001
## 168 prefer_share 14 168 3 2001
## 169 addit_inform 14 168 4 2001
## 170 materi_advers 14 168 3 2001
## 171 three_year 14 168 3 2001
## 172 toronto_stock 14 168 4 2001
## 173 own_subsidiari 14 168 3 2001
## 174 equiti_interest 14 168 2 2001
## 175 balanc_sheet 14 168 4 2001
## 176 insur_coverag 14 168 4 2001
## 177 princip_occup 14 168 4 2001
## 178 free_trade 14 168 1 2001
## 179 vote_meet 14 168 3 2001
## 180 pipelin_oper 14 168 2 2001
## 181 feder_system 14 168 1 2001
## 182 produc_shipper 14 168 2 2001
## 183 unit_issu 14 168 3 2001
## 184 person_proxi 14 168 2 2001
## 185 product_volum 14 168 2 2001
## 186 fund_asset 14 168 1 2001
## 187 financi_strength 14 168 1 2001
## 188 sharehold_dividend 14 168 1 2001
## 189 assembl_modul 14 168 1 2001
## 190 engin_assembl 14 168 1 2001
## 191 automot_oper 14 168 1 2001
## 192 magna_oper 14 168 1 2001
## 193 life_financi 14 168 1 2001
## 194 insur_annuiti 14 168 1 2001
## 195 annuiti_pension 14 168 1 2001
## 196 director_offic 13 196 4 2001
## 197 advers_effect 13 196 2 2001
## 198 item_general 13 196 1 2001
## 199 financi_year 13 196 4 2001
## 200 toronto_ontario 13 196 3 2001
## 201 price_trust 13 196 1 2001
## 202 sinc_prior 13 196 3 2001
## 203 carri_valu 13 196 1 2001
## 204 share_program 13 196 1 2001
## 205 pembina_system 13 196 1 2001
## 206 per_unit 13 196 2 2001
## 207 lake_pipelin 13 196 2 2001
## 208 system_throughput 13 196 2 2001
## 209 ppc_pipelin 13 196 1 2001
## 210 export_pipelin 13 196 2 2001
## 211 per_trust 13 196 1 2001
## 212 system_bbls 13 196 2 2001
## 213 river_pipelin 13 196 2 2001
## 214 non_resid 13 196 2 2001
## 215 mean_approxim 13 196 1 2001
## 216 approxim_pipelin 13 196 1 2001
## 217 partnership_will 13 196 1 2001
## 218 kilometr_mile 13 196 1 2001
## 219 least_outstand 13 196 1 2001
## 220 approxim_employe 13 196 2 2001
## 221 trade_agreement 13 196 1 2001
## 222 complet_vehicl 13 196 1 2001
## 223 american_european 13 196 1 2001
## 224 manufactur_oper 13 196 1 2001
## 225 non_tradit 13 196 1 2001
## 226 elliott_page 13 196 1 2001
## 227 reinsur_busi 13 196 1 2001
## 228 individu_annuiti 13 196 1 2001
## 229 commerci_mortgag 13 196 1 2001
## 230 bond_portfolio 13 196 1 2001
## 231 mortgag_portfolio 13 196 1 2001
## 232 insur_author 13 196 1 2001
## 233 among_thing 12 233 4 2001
## 234 incorpor_refer 12 233 4 2001
## 235 asset_manag 12 233 2 2001
## 236 weight_averag 12 233 2 2001
## 237 health_safeti 12 233 2 2001
## 238 facil_locat 12 233 2 2001
## 239 market_valu 12 233 3 2001
## 240 custom_servic 12 233 1 2001
## 241 provid_fund 12 233 2 2001
## 242 agreement_general 12 233 2 2001
## 243 approv_holder 12 233 2 2001
## 244 pembina_pipelin 12 233 1 2001
## 245 administr_agreement 12 233 1 2001
## 246 partnership_partnership 12 233 2 2001
## 247 approxim_bbls 12 233 2 2001
## 248 system_group 12 233 2 2001
## 249 revenu_pipelin 12 233 1 2001
## 250 pipelin_averag 12 233 2 2001
## 251 daili_throughput 12 233 2 2001
## 252 cold_lake 12 233 1 2001
## 253 transfer_class 12 233 1 2001
## 254 epl_pipelin 12 233 1 2001
## 255 own_direct 12 233 2 2001
## 256 signific_develop 12 233 1 2001
## 257 part_compon 12 233 1 2001
## 258 automot_sale 12 233 1 2001
## 259 exterior_system 12 233 1 2001
## 260 affin_market 12 233 1 2001
## 261 term_insur 12 233 1 2001
## 262 particip_policyhold 12 233 1 2001
## 263 insur_hold 12 233 1 2001
## 264 annuiti_product 12 233 1 2001
## 265 busi_oper 11 265 3 2001
## 266 canadian_dollar 11 265 4 2001
## 267 tier_capit 11 265 1 2001
## 268 capit_requir 11 265 3 2001
## 269 busi_corpor 11 265 2 2001
## 270 public_offer 11 265 3 2001
## 271 product_suppli 11 265 1 2001
## 272 oil_gas 11 265 3 2001
## 273 incom_fund 11 265 2 2001
## 274 averag_daili 11 265 2 2001
## 275 consent_holder 11 265 2 2001
## 276 manag_oper 11 265 3 2001
## 277 safeti_environment 11 265 2 2001
## 278 oper_system 11 265 2 2001
## 279 system_consist 11 265 2 2001
## 280 northern_system 11 265 1 2001
## 281 bbls_compris 11 265 1 2001
## 282 light_sweet 11 265 2 2001
## 283 abandon_cost 11 265 2 2001
## 284 trade_toronto 11 265 3 2001
## 285 automot_product 11 265 2 2001
## 286 system_deliv 11 265 2 2001
## 287 koch_valley 11 265 1 2001
## 288 valley_pipelin 11 265 1 2001
## 289 pipelin_limit 11 265 1 2001
## 290 heavi_blend 11 265 1 2001
## 291 asset_new 11 265 1 2001
## 292 throughput_capac 11 265 1 2001
## 293 asset_partnership 11 265 1 2001
## 294 meet_person 11 265 2 2001
## 295 written_consent 11 265 1 2001
## 296 program_manag 11 265 1 2001
## 297 product_line 11 265 2 2001
## 298 mirror_system 11 265 1 2001
## 299 sport_util 11 265 1 2001
## 300 parti_administr 11 265 1 2001
## 301 group_benefit 11 265 1 2001
## 302 casualti_reinsur 11 265 1 2001
## 303 polici_loan 11 265 1 2001
## 304 term_life 11 265 1 2001
## 305 consolid_general 11 265 1 2001
## 306 human_resourc 10 306 3 2001
## 307 canadian_u. 10 306 2 2001
## 308 audit_committe 10 306 4 2001
## 309 share_note 10 306 1 2001
## 310 recent_complet 10 306 3 2001
## 311 share_entitl 10 306 3 2001
## 312 princip_amount 10 306 2 2001
## 313 independ_director 10 306 2 2001
## 314 end_year 10 306 4 2001
## 315 oper_prior 10 306 2 2001
## 316 repres_approxim 10 306 1 2001
## 317 senior_vice 10 306 2 2001
## 318 regulatori_requir 10 306 4 2001
## 319 will_provid 10 306 4 2001
## 320 advers_affect 10 306 3 2001
## 321 rate_agenc 10 306 1 2001
## 322 system_oper 10 306 3 2001
## 323 busi_develop 10 306 3 2001
## 324 chairman_chief 10 306 4 2001
## 325 unit_held 10 306 2 2001
## 326 asset_liabil 10 306 2 2001
## 327 fund_busi 10 306 2 2001
## 328 relat_pipelin 10 306 2 2001
## 329 non_oper 10 306 2 2001
## 330 wabasca_system 10 306 1 2001
## 331 fort_saskatchewan 10 306 1 2001
## 332 tran_mountain 10 306 2 2001
## 333 gas_plant 10 306 2 2001
## 334 deliv_crude 10 306 2 2001
## 335 sweet_crude 10 306 2 2001
## 336 fund_will 10 306 2 2001
## 337 retir_incom 10 306 2 2001
## 338 addit_trust 10 306 1 2001
## 339 incom_loss 10 306 2 2001
## 340 name_municip 10 306 4 2001
## 341 municip_resid 10 306 4 2001
## 342 govern_agreement 10 306 1 2001
## 343 busi_partnership 10 306 1 2001
## 344 light_sour 10 306 1 2001
## 345 sour_crude 10 306 1 2001
## 346 inch_diamet 10 306 1 2001
## 347 storag_capac 10 306 2 2001
## 348 unit_annual 10 306 2 2001
## 349 new_asset 10 306 1 2001
## 350 unit_vote 10 306 1 2001
## 351 proxi_written 10 306 1 2001
## 352 insur_regul 10 306 1 2001
## 353 magna_intern 10 306 1 2001
## 354 wheel_drive 10 306 1 2001
## 355 develop_engin 10 306 1 2001
## 356 compani_consolid 10 306 2 2001
## 357 magna_entertain 10 306 1 2001
## 358 entertain_corp 10 306 1 2001
## 359 vehicl_platform 10 306 1 2001
## 360 research_develop 10 306 1 2001
## 361 chang_compani 10 306 1 2001
## 362 util_vehicl 10 306 1 2001
## 363 steyr_powertrain 10 306 1 2001
## 364 prefer_secur 10 306 2 2001
## 365 share_mec 10 306 1 2001
## 366 austria_magna 10 306 1 2001
## 367 gmbh_germani 10 306 1 2001
## 368 busi_line 10 306 1 2001
## 369 accid_health 10 306 1 2001
## 370 health_reinsur 10 306 1 2001
## 371 cash_valu 10 306 1 2001
## 372 manulif_wood 10 306 1 2001
## 373 wood_logan 10 306 1 2001
## 374 agenc_forc 10 306 1 2001
## 375 insur_regulatori 10 306 1 2001
## 376 control_level 10 306 1 2001
## 377 factor_includ 9 377 4 2001
## 378 financi_institut 9 377 2 2001
## 379 earn_per 9 377 2 2001
## 380 account_principl 9 377 3 2001
## 381 result_oper 9 377 4 2001
## 382 corpor_act 9 377 2 2001
## 383 aggreg_princip 9 377 2 2001
## 384 one_vote 9 377 3 2001
## 385 benefici_own 9 377 3 2001
## 386 agreement_provid 9 377 2 2001
## 387 cash_redempt 9 377 1 2001
## 388 period_end 9 377 2 2001
## 389 also_provid 9 377 3 2001
## 390 financi_oper 9 377 2 2001
## 391 applic_law 9 377 4 2001
## 392 follow_set 9 377 4 2001
## 393 payment_dividend 9 377 2 2001
## 394 return_capit 9 377 3 2001
## 395 full_servic 9 377 1 2001
## 396 compens_committe 9 377 4 2001
## 397 product_sale 9 377 2 2001
## 398 preliminari_short 9 377 4 2001
## 399 general_manag 9 377 3 2001
## 400 system_includ 9 377 3 2001
## 401 recent_year 9 377 3 2001
## 402 manufactur_process 9 377 1 2001
## 403 agreement_manag 9 377 1 2001
## 404 partnership_general 9 377 2 2001
## 405 bonni_glen 9 377 1 2001
## 406 glen_system 9 377 1 2001
## 407 ppc_manag 9 377 1 2001
## 408 manag_ppc 9 377 1 2001
## 409 administr_expens 9 377 2 2001
## 410 gather_system 9 377 2 2001
## 411 bbls_crude 9 377 2 2001
## 412 oil_system 9 377 2 2001
## 413 miscibl_flood 9 377 1 2001
## 414 throughput_volum 9 377 2 2001
## 415 ppc_will 9 377 1 2001
## 416 storag_tank 9 377 2 2001
## 417 distribut_distribut 9 377 2 2001
## 418 general_administr 9 377 2 2001
## 419 partner_general 9 377 1 2001
## 420 oil_light 9 377 1 2001
## 421 inch_inch 9 377 1 2001
## 422 boost_station 9 377 1 2001
## 423 cash_reserv 9 377 1 2001
## 424 oper_consist 9 377 1 2001
## 425 subject_regul 9 377 1 2001
## 426 unit_kingdom 9 377 2 2001
## 427 manag_limit 9 377 1 2001
## 428 letter_intent 9 377 1 2001
## 429 item_corpor 9 377 1 2001
## 430 vehicl_product 9 377 1 2001
## 431 magna_automot 9 377 1 2001
## 432 approxim_magna 9 377 1 2001
## 433 servic_supplier 9 377 1 2001
## 434 european_oem 9 377 1 2001
## 435 compani_board 9 377 2 2001
## 436 automot_system 9 377 1 2001
## 437 vehicl_system 9 377 1 2001
## 438 innov_product 9 377 2 2001
## 439 engin_centr 9 377 1 2001
## 440 interior_system 9 377 1 2001
## 441 tesma_class 9 377 1 2001
## 442 mec_class 9 377 1 2001
## 443 tax_profit 9 377 1 2001
## 444 america_delawar 9 377 2 2001
## 445 control_interest 9 377 1 2001
## 446 insur_group 9 377 1 2001
## 447 non_control 9 377 1 2001
## 448 insur_law 9 377 1 2001
## 449 invest_platform 9 377 1 2001
## 450 individu_wealth 9 377 1 2001
## 451 appoint_actuari 9 377 1 2001
## 452 annuiti_contract 9 377 1 2001
## 453 common_invest 9 377 1 2001
## 454 whole_life 9 377 1 2001
## 455 manulif_north 9 377 1 2001
## 456 manulif_intern 9 377 1 2001
## 457 regul_insur 9 377 1 2001
## 458 fund_compani 9 377 1 2001
## 459 compani_ordin 9 377 1 2001
## 460 copi_document 8 460 3 2001
## 461 oper_financi 8 460 2 2001
## 462 market_secur 8 460 4 2001
## 463 offic_director 8 460 3 2001
## 464 includ_limit 8 460 4 2001
## 465 share_capit 8 460 3 2001
## 466 senior_manag 8 460 4 2001
## 467 initi_public 8 460 3 2001
## 468 control_direct 8 460 2 2001
## 469 oper_offic 8 460 2 2001
## 470 term_condit 8 460 3 2001
## 471 privat_placement 8 460 2 2001
## 472 develop_chang 8 460 1 2001
## 473 own_oper 8 460 3 2001
## 474 director_elect 8 460 2 2001
## 475 product_design 8 460 2 2001
## 476 presid_financ 8 460 3 2001
## 477 fourth_quarter 8 460 2 2001
## 478 matter_relat 8 460 3 2001
## 479 complet_financi 8 460 3 2001
## 480 collect_bargain 8 460 1 2001
## 481 oper_cash 8 460 2 2001
## 482 new_busi 8 460 2 2001
## 483 interest_subsidiari 8 460 1 2001
## 484 insur_industri 8 460 1 2001
## 485 presid_general 8 460 2 2001
## 486 ontario_ontario 8 460 1 2001
## 487 manag_product 8 460 1 2001
## 488 redempt_right 8 460 1 2001
## 489 product_provid 8 460 2 2001
## 490 will_also 8 460 4 2001
## 491 descript_fund 8 460 1 2001
## 492 system_pembina 8 460 1 2001
## 493 fund_ppc 8 460 1 2001
## 494 system_western 8 460 1 2001
## 495 bbls_averag 8 460 1 2001
## 496 oil_pipelin 8 460 2 2001
## 497 capit_ppc 8 460 1 2001
## 498 unit_fund 8 460 1 2001
## 499 distribut_unithold 8 460 2 2001
## 500 system_gather 8 460 2 2001
## feature frequency rank docfreq group
## 92104 climat_chang 1 9075 1 2004
## 120560 climat_chang 3 3128 2 2005
## 164726 climat_chang 2 5466 2 2006
## 207194 climat_chang 2 5503 2 2007
## 256729 climat_chang 1 14348 1 2008
## 350221 climat_chang 4 3203 2 2010
## 406859 climat_chang 4 3089 2 2011
## 459630 climat_chang 5 1737 3 2012
## 504012 climat_chang 5 1709 3 2013
## 549172 climat_chang 6 1968 3 2014
## 608501 climat_chang 7 2862 3 2015
## 677036 climat_chang 8 1728 3 2016
## 741285 climat_chang 9 1326 4 2017
## 802962 climat_chang 18 544 6 2018
## 870873 climat_chang 28 263 6 2019
## 941053 climat_chang 43 139 6 2020
## 4 crude_oil 142 4 3 2001
## 24654 crude_oil 151 3 3 2002
## 55434 crude_oil 167 1 3 2003
## 82829 crude_oil 175 2 3 2004
## 117390 crude_oil 153 7 3 2005
## 159156 crude_oil 151 5 2 2006
## 201542 crude_oil 139 7 2 2007
## 242084 crude_oil 97 25 2 2008
## 292853 crude_oil 79 36 2 2009
## 346980 crude_oil 79 33 2 2010
## 403720 crude_oil 87 26 2 2011
## 457890 crude_oil 52 45 1 2012
## 502291 crude_oil 64 30 1 2013
## 547167 crude_oil 105 24 2 2014
## 605641 crude_oil 115 41 2 2015
## 675314 crude_oil 111 34 2 2016
## 739965 crude_oil 116 25 2 2017
## 802437 crude_oil 110 32 2 2018
## 870638 crude_oil 115 31 2 2019
## 940945 crude_oil 115 30 2 2020
## 360 research_develop 10 306 1 2001
## 25064 research_develop 10 346 1 2002
## 55741 research_develop 10 262 1 2003
## 83168 research_develop 12 300 1 2004
## 117776 research_develop 13 353 1 2005
## 159686 research_develop 11 485 1 2006
## 242768 research_develop 11 620 1 2008
## 293701 research_develop 11 768 1 2009
## 347683 research_develop 12 649 1 2010
## 404293 research_develop 13 537 1 2011
## 458551 research_develop 10 608 1 2012
## 503481 research_develop 7 1012 1 2013
## 547827 research_develop 13 602 1 2014
## 653585 research_develop 1 24060 1 2015
## 719876 research_develop 1 20389 1 2016
## 783542 research_develop 1 18977 1 2017
## 805568 research_develop 6 2592 1 2018
## 872353 research_develop 9 1535 1 2019
## 943006 research_develop 8 1817 1 2020
## 389930 exact_scienc 1 16015 1 2010
## 444684 exact_scienc 1 15465 1 2011
## 488482 exact_scienc 1 12091 1 2012
## 532671 exact_scienc 1 12092 1 2013
## 590612 exact_scienc 1 16953 1 2014
## 655317 exact_scienc 1 24060 1 2015
## 721498 exact_scienc 1 20389 1 2016
## 785453 exact_scienc 1 18977 1 2017
## 850936 exact_scienc 1 20852 1 2018
## 920197 exact_scienc 1 21455 1 2019
## 987402 exact_scienc 1 20911 1 2020
This plot demonstrates frequencies of the bigrams-of-interest
Basw KeyATM for bigrams
I created a dictionary for science skepticism (example).
Bigram_Dict_KeyATM <- list(
climate = c("climat_chang", "global_climat" , "life_health"),
science = c("research_develop", "exact_scienc"),
energy = c("crude_oil", "pipelin_system", "feeder_pipelin", "natur_gas"))
The output demonstrates a base KeyATM model with extra 3 topics It also shows which topic is the most common in a documents
## Initializing the model...
## Fitting the model. 1500 iterations...
## Creating an output object. It may take time...
## 1_climate 2_science 3_energy
## 1 prefer_share automobil_manufactur inter_pipelin
## 2 audit_committe advers_effect general_partner
## 3 execut_offic vote_share pipelin_system [✓]
## 4 common_share materi_advers crude_oil [✓]
## 5 first_prefer class_subordin natur_gas [✓]
## 6 financi_statement subordin_vote class_prefer
## 7 extern_auditor corpor_constitut trust_unit
## 8 share_seri effect_profit class_unit
## 9 nation_bank north_america audit_committe
## 10 board_director advers_affect oil_sand
## Other_1 Other_2 Other_3
## 1 common_share prefer_share class_share
## 2 prefer_share first_prefer life_insur
## 3 manulif_financi share_seri share_class
## 4 insur_compani per_cent unit_state
## 5 burger_king audit_committe long_term
## 6 financi_statement power_financi share_seri
## 7 exchang_unit vice_presid independ_auditor
## 8 mutual_fund extern_auditor seri_class
## 9 partnership_exchang unit_state hong_kong
## 10 interest_rate per_share board_director
## 1_climate 2_science 3_energy Other_1 Other_2 Other_3
## 1 127 35 80 113 105 61
## 2 130 36 10 114 103 62
## 3 128 32 11 112 104 60
## 4 126 37 79 111 106 58
## 5 125 33 8 115 101 59
## 6 129 34 77 110 107 57
## 7 124 27 81 46 102 56
## 8 123 28 85 45 100 55
## 9 131 38 83 44 99 54
## 10 122 29 82 43 108 53