Measuring Scientific Impact – Data from Study

This is the material accompanying oru paper on measuring comprehensive scientific impact (not just how many citations the work has but also its effect on society, industry and the economy).

This page provides links to the data sets that we used for our study.

Reuse of this work

External data that we link to here is licensed under the original terms of the respective authors. New data that we have created or generated is provided under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license

Citation Network Data

Data for citation networks that we reused is available here for arXiv (via Paperscape) and here for CiteSeerX (via RefSeer).

The PubMed Central citation network that we created can be downloaded from here. We provide a CSV file with 2 rows: <Cited Paper>,<Citee Paper> (the paper in the right column has cited the paper in the left column). The file describes 7800257 papers in the PubMed Open Access collection. The IDs are PMIDs which you can use to query PubMed directly to get hold of respective metadata etc.

REF Study:Citation Network Link Data

We also provide tables containing link information between papers in the citation network and REF case studies.

In each of the CSV files the first column is REF study ID and second column is paper ID. The paper ID corresponds to the respective indexing system (i.e. ref_arxiv_links.csv uses arXiv IDs while ref_pmc_links.csv uses Pubmed IDs).

These files are available here.

REF Case Study Data

We have calculated the total citation counts for each REF study based on our citation network links above. This information is available here. The first column is the case study ID, the second is the total number of citations linked to that study.

In our study we describe a method for calculating h-Index per REF study (as opposed to per author). The per-study h-index values for all of the REF Case Studies in our experiment are available here. The first column is the case study ID, the second is the h-index for that study.

Aggregated Data

Finally we make available the aggregated results from our study. This data is available here. Columns are labelled as described below.

Each row in the CSV represents one UoA/Institution Package (e.g. Computer Science at University of Warwick). The UoA and Institution ID make up the first row. Institution IDs can be reconciled by finding the respective institution in the results which can be  downloaded from the REF website.

Staff shows the total number of staff (which can be fractional due to part time/split time workers) that contributed to an UoA/Institution package. Rows 4*,3*,2*,1* and unclassified represent the percentage n of case studies submitted in the respective UoA/Institution package that were given this rating. We also present average_ref_score which is the mean of these values.

Mean Normalised Citation Score gives the average MNCS over each case study contributing to a UoA/Institution package. The calculation for which is explained in the “Mean Normalised Citation Score and REF Impact Score” section of our paper.

Average Per-Author h-index gives the average h-index across all authors associated with case studies in a UoA/Institution package. These h-index values are calculated in the standard way using the Experimental Citation Networks (ECNs) generated in our paper.

Average Per-Study h-index provides average h-index over all studies (see section “Case Study h-index and REF Impact Score” for details on this calculation) in a UoA/Institution package.

Average Altmetric Score gives the mean of altmetric scores for all papers contributing to a UoA/Institution package’s altmetric impact.

Contributing case studies gives the number of case studies that contribute to these UoA/Institution packages. In some cases institutions may submit 1 or 2 case studies, in others as many as 20.