Overview Web Appendix

The following Web Appendix is organized as follows:

  1. Section 1.1 until 1.4 describes the Data Cleaning process. Multiple approaches were taken to cleaning due to the different requirements of the feature sets. For those feature sets that required the usage of the Stanford Grammatical Dependency Parser (F2,F3,F4) only minimal cleaning steps were taken in order to ensure the functioning of the parser. Moreover, punctuation was reinserted for those feature sets that required grammatical dependency parsing.

  2. Section 2.1 until 2.5 describe the feature extraction process. In case of more complicated feature sets (F3,F4) an overview sheet was provided (2.3.0 for F3 and 2.4.0 for F4).

  3. Section 3.1 up to 3.15 describes the feature experimentation process including different types of feature sets. In section 3.1-3.5 Term Frequency and percentile cut-offs of the 10th, 30th, 50th, 70th and 90th percentile were considered. This process was repeated for TFIDF and Term Presence. The classifier used was Naive Bayes.

  4. Section 4.1 until 7.2 describe the experimentation process with the feature sets 2-5. In detail the following feature sets are considered:

    • 4.1 Valence Feature Set (F2A)
    • 4.2 Valence Feature Set combined with F1 (F2B)
    • 5.1 Negation Feature Set (F3A)
    • 5.2 Negation Feature Set combined with F1 (F3B)
    • 6.1 Activation Feature Set (F3A)
    • 6.2 Activation Feature Set combined with F1 (F4B)
    • 7.1 Directives Feature Set (F5A)
    • 7.2 Directives Feature Set combined with F1 (F5B)

The Naive Bayes classifier was used.

  1. Section 8.1 up to 12.2 repeat the steps 1-4) but instead of Naive Bayes, the SVM classifier was used.