ToTA: Texts of Trade Agreements

The Texts of Trade Agreements (ToTA) project makes a machine-readable and annotated full text corpus of preferential trade agreements (PTAs) publicly available to scholars and policy-makers and uses state-of-the-art text-as-data techniques to analyze it.


The number of trade agreements has dramatically increased since the early 1990s. Trade agreements cover ever more issues and an average agreement text is now around ten times longer than 25 years ago. This makes it more and more difficult to analyze the content of trade agreements and assess their impact on international trade and welfare. Big data and text-as-data methods can help researchers, policy-makers and other stakeholders to better manage the growing complexity of trade agreements.

Need for digitized texts

Modern computational methods, however, require the existence of machine-readable texts. While several databases make PTA texts available, they are generally optimized for reading, but not computational analysis. As part of a year-long effort, this project used the WTO RTA Database to locate text and meta-data of close to 450 preferential trade agreements and transformed them into a machine-readable format that allows analysis on the article, chapter or treaty-level of PTA texts.

Approach and findings

Based on the Texts of Trade Agreements (ToTA) infrastructure, we could then employ text-as-data methods to automatically map the content of PTAs gaining new insights on trade agreements. Textual similarity measures, for example, are able to capture fine-grained differences in treaty design. So-called dimensionality reduction techniques, which compress the textual information contained in a text into a set of abstract variables, help predict trade flows more accurately than previously available measures.


All data behind this research is available at

 This project is a collaboration between


Text-as-data analysis of preferential trade agreements: Mapping the PTA landscape

In this article, we introduce a new structured corpus of digitized PTA full texts drawn from the WTO RTA database, and apply text-as-data tools to map the design of PTAs. We argue that textual similarity measures are particularly suitable to capture fine-grained treaty design differences and find that the term PTA regroups a set of very heterogeneous agreements, which vary systematically in scope, content and language. [Read more]

The Impact of the TPP on Trade Between Member Countries: A Text-As-Data Approach

We propose a new method to predict the impact of preferential trade agreements (PTAs) on trade and welfare, taking the Trans-Pacific Partnership (TPP) agreement as a case study. Relying on a novel dataset of treaty texts covering all trade agreements notified to the World Trade Organization, we first construct an indicator comparing existing PTAs to the TPP in terms of textual similarity. [Read more]