WEF Risk Register Text Analysis

Introduction

This report presents text analysis of the World Economic Forum risk register. The charts and graphs in this report highlight key themes and interactions which may not otherwise be apparent in a voluminous register containing hundreds or even thousands of risks.

Word cloud

The following image presents a word cloud of the most frequently occuring words in the risk related text in the entire register. Stop words and certain frequently occuring words (provided by user) are excluded.

Sentiment Analysis

The following chart shows sentiment score per risk for each Theme. Since number of risks can vary significantly from Theme to Theme, average score per risk is shown below. Afinn lexicon is used to assign sentiment score.

A Theme with a large negative sentiment score may indicate a high intensity of concern in that Theme, but not necessarily a higher quantum of total concern.

Key Words

Where as the word cloud highlighted top words based on simple frequency, the following chart presents top 20 words based on TF_IDF (Term Frequency - Invesrse Document Frequency) scores. For purpose of this analysis a is considered a document.

TF_IDF = TF * IDF

TF = Number of occurences of a word in a document / Total words in that document

IDF = ln (total number of documents / Number of documents containing the word)

A word’s TF-IDF score (importance) is high if it appears in fewer documents, and yet has a high frequency within the documents that it does occur. Words that appears in most or all of the documents, being common, have lower TF-IDF scores. Similarly, words that appear in just one or two documents, and occur just once or a few times, have lower TF-IDF scores.

Bigrams Analysis

This chart highlights relationship between key words. The keywords are those that have the highest importance based on tf-idf scores. Stop words and frequently occuring words, however, have been retained so as to not lose the context.

Trigrams Analysis

A trigram network chart is presented as it may highlight stories not captured by bigrams chart. Stop words have not been removed for this analysis. Top trigrams have been selected based on TF-IDF values.

Topic Discovery

Latent Dirichlet Allocation model is used to divide risk text into 9 topics. The following chart presents top words based on beta values (per topic per word probabilities).

Risks Interdependencies

The following chart shows risk network chart based on text content. Each rik is treated as separate document. This allows for Risk-Risk linkages to be displayed. Risks are prefixed ID numbers. Risk titles have been truncated to avoid clutter. The chart has been generated after stemming words and removing stop words and numbers.

See the Textnets package on Github for more information.

Interactive Linkages Chart

An interactive chart displaying linkages between risks and actions is available in html format.