The CEPS EurLex dataset was awarded the “Best Poster Award for Open Science” at the PolMeth Europe Conference 2021. The CEPS EurLex dataset contains 142.036 EU laws – almost the entire corpus of the EU’s digitally available legal acts passed between 1952 – 2019. The dataset is designed to be a free, public resource for researchers analysing the legal aquis of the European Union. The award demonstrates the demand of the research community for open datasets and we invite researchers and practitioners to make creative use of our multi-purpose dataset.
Brief description of the dataset:
– The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables.
– It includes the full legal text and 22 meta-data variables for the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language.
– The full text of 134.633 laws is included (column “act_raw_text”). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English).
– 22 additional variables are included, such as ‘Act_name’, ‘Act_type’, ‘Subject_matter’, ‘Authors’, ‘Date_document’, ‘ELI_link’, ‘CELEX’ (a unique identifier for every law). Please see the “CEPS_EurLex_codebook.pdf” file for an explanation of all variables.
– The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python.
The dataset was collected as part of the TRIGGER project and is freely available on the TRIGGER website (https://trigger.eui.eu/ceps-