Data summary

To extract data from GtR, I used 1656 unique GtR grant IDs. These grants corresponded to 1502 unique REF case studies, giving 2194 uniqe grant/case study pairs. Only 188 unique grants contained potential impact statemeents, which were extracted from GtR. These 188 grants corresponded to 204 REF case studies. As a result, we have 219 grant/case study pairs with both GtR and REF impact texts. Apart from potential impact texts, I extracted some additional information from GtR, such as start and end dates (and calculated project duration based on these) and funding value. I also added some metadata from REF, such as panel, unit of assessmeent, impact type, funders. There are more potentially relevant data fields there and it may be useful to look at some other variables as well. Below are some graphs summarising some of the abovementioned parameters.

GtR project duration

The graph shows project duration in our initial GtR data set against those GtR grants where impact statements were available. We’re looking at unique grants here and not grant/case study pairs. What this tells us is that we are missing projects with longer duration from our sample with GtR impact statements. This is a bit unfortunate, because the disparity between GtR and REF impacts could be more interesting for projects with longer duration. On the other hand, there were not too many long duration projects in the entire data set to begin with.

GtR start and end dates

Importantly, both start and end dates for GtR projects with potential impact statements tend to belong to more recent years than most GtR projects in the data set. Most likely, older projects were not required to provide potential impact statements (or some other similar reason). I think this is an important observation on this basic descriptive level, because it shows that our sample is biased toward more recent projects. As for funding value, the distribution seems to be relatively even across all funding amounts, so there is nothing special there (I’m not including that graph here).

REF panels

As for REF panels, as I mentioned previously, panel B (physics and engineering) seems to be represented better than others in our sample with GtR impact statements. But overall panels are not evenly represented even in the initial GtR data set when compard to the original REF data, where the number of case studies per panel is roughly identical.

Below is how case studies are distributed in the original REF data.

REF impact type

Among other things that might be interesitng to look at are REF impact types. I’m not sure who populated this field, but it might be useful. Below is the comparison of impact for case studies in the entire GtR set and sample with impact statements.

Linking GtR councils to REF panels

We can link GtR funders with REF panels to see whether grant funding in one broad research field can be associated with impacts in a significantly different field. This can be visualised as a clustered dot plot or alluvial plot (see both below). The plots do show some non-linear relationships between funders and panels in terms of research orientation. In fact, we achieved similar results when compared Researchfish funders with REF panels last summer.

Linking GtR councils to REF impact type

We don’t know at this point how REF Impact Type is populated, but it is an interesting variablle, which potentially helps understand which GtR funders are associated more or less with which types of impact. Below are two similar visualisations linking these two variables.

GtR Research Topics

For most GtR grants we can obtain up to several research topics. One thing we can do with this is to search REF impact descriptions to see if any of these topics are also present there. This approach is not without issues, however. Below I describe the results and limitations of this analysis.

First, not all GtR grants have research topics available. The plot below shows that 1234 grants have topics, while 630 don’t. For an extra 330 grants the topic is Unclassified, which is not very informative, but we will keep these grants for now and only remove those where the topic is not available at all.

GtR topics are usually represented by a single term or phrase (see below). Sometimes these phrases contain abbreviations, which is a problem if we want to match terms directly to REF impact statements. Also, REF impact statements can make use of synonyms to refer to similar topics, which is another issue.

## # A tibble: 30 x 3
##    case_study_id grant_id topics_unlisted               
##            <dbl> <chr>    <chr>                         
##  1         34495 111956/1 Archaeological Theory         
##  2         34495 111956/1 Language Variation & Change   
##  3         34495 111956/1 Science-Based Archaeology     
##  4         44278 112864/1 Aesthetics                    
##  5         44278 112864/1 Ethics                        
##  6         44278 112864/1 Jurisprudence/Legal Philosophy
##  7         44278 112864/1 Political Philosophy          
##  8         44067 119215/1 Applied Arts HTP              
##  9         44067 119215/1 Cultural History              
## 10         44067 119215/1 Design HTP                    
## # … with 20 more rows

Nevertheless, we can try and match GtR topics as they are to REF impact statements. Here we only acount for lower and upper case. The result is relatively uninspiring as we only get 103 grant/case study combinations with a match and 1461 without.

We can attempt to slightly imrpove on this by splitting all topic phrases into individual terms, removing common words (such as to, the, of etc) and attempting the match again. Below is how the data looks after this cleaning step.

## # A tibble: 30 x 3
##    case_study_id grant_id topics_unlisted    
##            <dbl> <chr>    <chr>              
##  1         34495 111956/1 Archaeological     
##  2         34495 111956/1 Theory             
##  3         34495 111956/1 Language           
##  4         34495 111956/1 Variation          
##  5         34495 111956/1 Change             
##  6         34495 111956/1 Science-Based      
##  7         34495 111956/1 Archaeology        
##  8         44278 112864/1 Aesthetics         
##  9         44278 112864/1 Ethics             
## 10         44278 112864/1 Jurisprudence/Legal
## # … with 20 more rows

When we do the matching again, the result is slightly better with 521 mathced grants/case studies and 1043 unmatched. However, the limitations of this method are too serious at this point and it needs to be refined further to account for synonyms, abbreviations etc.