Final text summary scaling and validation

This file contains the overview of the scaling completed on 31 May 2021, and the validation results completed by EA and AK in July 2021.

1.Distribution of probabilities

The histograms below shows the distribution of LR probabilities for the final text summaries and for the proposal summaries

Note: it is clear that scaling of the final acts is more skewed towards 1. Both EA and AK agree that there are instances of overestimation of the authority expansion in the scaling of the summaries of the Final acts. Prospective sources of this ‘positive’ bias are discussed in the sections below.

2. Validation

1. EA and AK have carried out validation of the scaling for the final act summaries. In total 125 summaries were reviewed, which constitutes approximately 14.45% of the sample.

2. the validation set was drawn from the the full sample using the following rules:

  • The subset contains summaries which are scaled <=0.05; [0.48; 0.52], >= 0.95

  • The subset also includes procedures for which the scaling of the proposal summaries changed for more than |0.2| points in either direction( Comparing scaling completed on 12 May 2021 and 31 May 2021)

3. In the next step, we unpack how many disagreements there are between the coders and whether these disagreements concern the ‘borderline cases’.

4. The results below show that upon completion of the manual validation of the final text summaries, the disagreements between the coders concerned both ‘borderline’ cases as well as those closer to the positive extreme of the distribution. In total, EA and AK disagreed on 36 procedures. Figure 2 below summarizes the results.

Overall, the first look into the pattern of disagreements suggests that even for human annotators, it is much more difficult to agree on the classification of the final text.

Possible explanations:

1. The summaries of the finals texts are much less explicit in their key purposes and goals. They are less likely to explicitly state that a novel policy measure/mechanisms is designed within the legislation. Similarly, the final text summaries tend to avoid a string language stressing the establishment of the common standards or common enforcement mechanisms.

2 The ‘CONTENT’ section of the final text summary may contain background information about the policy at hand. As a result, this makes it more challenging to disentangle whether the proposed measure is an extension/update of the extant one or a new policy instrument.

2.1. Cases when both coders disagree with the predictions
  • There are some cases, where coders are in agreement with each other, but not with the predictions of the classifier.
  • the encouraging first result here is that such incongruences are mostly limited to borderline cases

3. Coder reconciliation and final results

As the next step, EA and AK have identified the procedures on which they have disagreed. Having thoroughly reviewed the coding guidelines in Wratil (2019), as well as the codebook compiled for proposal summaries coding, 28 cases were reconciled.

Key points of disagreements

  1. Introduction and enforcement of common minimum standards across the states ( Following Wratil and previous codebook, this indicates expansion of the EU authority);

    1.a. Teasing this out from the context of the text sometimes is not straightforward due to vagueness of the prose in the summary.

  2. Transfer of the program previously undertaken by the Member States into the EU level instrument signifies expansion;

    2.a. The challenge is not to confuse with cases where the EU follows international agreement or joins existing program which would indicate no expansion.

  3. Disentangling technical renewals and substantive expansion of the mandate;

##  data$Coder_reconciliation   n     percent valid_percent
##                          0   8 0.009248555     0.2222222
##                          1  28 0.032369942     0.7777778
##                         NA 829 0.958381503            NA

As Figure 4 shows, after re-assessing the contentious cases, the coders managed to reconcile the evaluation of the Final summaries. Remaining 8 procedures remain to be contentious because the information in these summaries can be interpreted as both expanding and as SQ.

  • 3 out of eight remaining contentious cases can be considered as `borderline’ procedures as shown by the LR probabilities.

  • 5 remaining procedures were scaled as being more pro-authority expansion. However, manual validation suggests that these procedures can not be clearly classified.

3.a.How does scaling of the summaries weather vis-a-vis ‘reconciled’ manual validation results?

##  data$model_coders_agreement   n     percent valid_percent
##           Correct Prediction  96 0.110982659         0.768
##        Incorrect Predictions  21 0.024277457         0.168
##                 No_agreement   8 0.009248555         0.064
##                         <NA> 740 0.855491329            NA

The table above compares the reconciled manual classification of the summaries with the results from the scaling predictions for the Final text summaries. Category titled ‘No_agreement’ captures cases where AK and EA could not definitively identify whether the proposal expands the EU authority or maintains the SQ. As manual classification is not not possible for these cases, one cannot compare it to the model predictions.

Overall, the results of the scaling appear to be quite encouraging. However, we need to keep in mind that any comparison of the manual annotation and the scaling essentially allows us to compare the binary classification and does not necessarily permits us to identify whether the scaling over/under-estimates the ambitions of the final acts consistently!

Yet after having read the proposals, both AK and EA agree, that the probabilities for the Final acts’ summaries do occasionally overestimate the degree of ambition within the text.

Possible reasons for overestimation

  1. Background information is embedded into the “CONTENT” section of the Final Text summary;

  2. The language and style in which information is presented in the Final summaries differ from the Proposal summaries: it tends to be less precise, less explicit.

  3. Training set consisting only of the proposal summaries. ==> Coupled with the first two points, this can have a a systematic effect on the scaling for the Final text summaries.

  4. Package deals: Some legislative procedures are parts of package deals, hence the content of the summary may reflect not only the content and the ambition of the specific proposal at hand, but also of the overall goals and ambitions of the entire package. Such examples were present in the validation set (e.g. 2013/0014(COD) and 2013/0015(COD) are parts of the ‘Railway package’; 2011/0298(COD) is a part of MiFID/MiFIR package deal;)