The Texts of Trade Agreements (ToTA) project makes a machine-readable and annotated full text corpus of preferential trade agreements (PTAs) publicly available to scholars and policy-makers and uses state-of-the-art text-as-data techniques to analyze it.
The number of trade agreements has dramatically increased since the early 1990s. Trade agreements cover ever more issues and an average agreement text is now around ten times longer than 25 years ago. This makes it more and more difficult to analyze the content of trade agreements and assess their impact on international trade and welfare. Big data and text-as-data methods can help researchers, policy-makers and other stakeholders to better manage the growing complexity of trade agreements.
Need for digitized texts
Modern computational methods, however, require the existence of machine-readable texts. While several databases make PTA texts available, they are generally optimized for reading, but not computational analysis. As part of a year-long effort, this project used the WTO RTA Database to locate text and meta-data of close to 450 preferential trade agreements and transformed them into a machine-readable format that allows analysis on the article, chapter or treaty-level of PTA texts.
Approach and findings
Based on the Texts of Trade Agreements (ToTA) infrastructure, we could then employ text-as-data methods to automatically map the content of PTAs gaining new insights on trade agreements. Textual similarity measures, for example, are able to capture fine-grained differences in treaty design. So-called dimensionality reduction techniques, which compress the textual information contained in a text into a set of abstract variables, help predict trade flows more accurately than previously available measures.
Text of Trade Agreements (ToTA)—A Structured Corpus for the Text‐as‐Data Analysis of Preferential Trade Agreements
With multilateral negotiations at the World Trade Organization (WTO) in deadlock, rulemaking on international economic governance has shifted to preferential trade agreements (PTAs). To facilitate the scholarly investigation of the fast‐growing universe of PTAs, this article introduces a machine‐readable and structured full text corpus of 448 WTO‐notified trade agreements stored on a Github repository—the Text of Trade Agreements (ToTA) corpus. [Read more]
Text-as-data analysis of preferential trade agreements: Mapping the PTA landscape
In this article, we introduce a new structured corpus of digitized PTA full texts drawn from the WTO RTA database, and apply text-as-data tools to map the design of PTAs. We argue that textual similarity measures are particularly suitable to capture fine-grained treaty design differences and find that the term PTA regroups a set of very heterogeneous agreements, which vary systematically in scope, content and language. [Read more]
The Impact of the TPP on Trade Between Member Countries: A Text-As-Data Approach
We propose a new method to predict the impact of preferential trade agreements (PTAs) on trade and welfare, taking the Trans-Pacific Partnership (TPP) agreement as a case study. Relying on a novel dataset of treaty texts covering all trade agreements notified to the World Trade Organization, we first construct an indicator comparing existing PTAs to the TPP in terms of textual similarity. [Read more]
Mapping BITs – Discover patterns of consistency and innovation in the BIT universe
“With thousands of treaties, many ongoing negotiations and multiple dispute-settlement mechanisms, today’s IIA regime has come close to a point where it is too big and complex to handle for governments and investors alike.”
Using text-as-data analysis, we reduce this complexity and allow policy-makers, arbitrators, and scholars to:
Assess the consistency of a country’s BIT network (cf. USA before and after 2004 Model BIT)
Trace the evolution of BITs over time (cf. early German treaty practice)
Compare similarities between individual treaties, revealing patterns of convergence and divergence (cf. countries adopting USA 2004 Model BIT)
Understand how the Trans-Pacific Partnership is different from earlier treaty practice
Explains the intuition and approach behind our methodology and presents the main capabilities of this website
Close to 3000 bilateral investment treaties (BITs) have been concluded since 1959
Most BITs provide investors with the right to sue their host country for a BIT violation before international arbitration
Over 550 of such investment claims have been filed to this date
Scholars and arbitrators have recognized that common principles underlie investment treaties. Amongst others, BITs typically provide for:
Compensation in case of expropriation
Fair and equitable treatment of investment
Full protection and security
Non-discrimination (national and most-favoured nation treatment)
At the same time, scholars and arbitrators have noted that treaties diverge in their individual wording. The lack of adequate empirical tools, however, has long made it difficult to quantify just how different or how similar these treaties are.
Mapping BITs now remedies this shortcoming allowing users to discover uniformity and diversity among BITs for themselves.
The large heat map compares 1628 English-language BITs concluded between 1959 and 2014. Every treaty occurs once on the horizontal axis and once on the vertical axis. We've developed a continuous metric to gauge similarity between treaties (see Methodology section for the details). This metric ranges from 0 (dark red, full similarity) to 1 (bright yellow, no similarity):
. Each field of the heat map represents two BITs being compared.
Each treaty is compared with itself along the diagonal line. The two sides of the heat map above and below that line are symmetric. Black quadrangles are the borders of individual country treaty networks. Alternatively, they delimit the clusters of similar treaties.
Zoom in: each field is labeled with the treaty dyad being compared
Search for a treaty: look for specific BIT to locate it along the diagonal line and explore the space around it
Find similar treaties: click on a treaty square to find BITs that are most similar to each treaty in the pair
Understand differences between treaties: click on a treaty square and go to the “Term usage” tab to see which words are differently used in between the treaties.
3 sorting options
For each bilateral treaty, we identified the wealthier treaty party based on its GDP per capita at the time of signature. Where a treaty involves an OECD member, or, alternatively, a BRIC country, that country is always named first as wealthier party by default.
Therefore, you can sort the treaties on the axes heat map in two ways: by wealthier party or by less wealthy counterpart. Within parties, the treaties are always sorted by date of signature. In addition to that, we've identified clusters of similar treaties. By choosing the third option you sort the treaties by clusters.
We show the world map and color countries depending on their engagement in BIT network. We've obtained the data on the universe of BITs signed from UNCTAD. The more treaties a country signed, the darker is the shade of blue. However, we have English-language treaty texts for 51% of the treaties ever signed. The potential undersampling is displayed with red color palette.
Finally, we rank the countries based on the coherence of their treaty networks. The lower the mean distance between treaties struck by a country, the darker is the shade of green on the map for this country.
Click on a country to view a heat map of its BIT network. Each field of the heat map represents two BITs being compared. You can reorder the heat map to match your research interest:
Chronological order: by default each country network is ordered by along ascending years
Pick your base line: click on a treaty along the vertical or horizontal axes to re-order the heat map in relation to that treaty
We also provide links to UNCTAD country profiles to give users background information on countries' economies, international trade and FDI.
Our study reveals the following insights:
Rule-makers vs rule-takers: developed countries are the rule-makers in the BIT universe. Their treaty networks are considerably more consistent than the treaty networks of developing countries that are the rule-takers of the system.
Idiosyncratic treaty networks: the investment treaty universe is less homogenous than what one may have expected. Developed countries' treaty networks display strong idiosyncratic features.
Revealing innovation: our heat maps trace innovation in national BIT programs as countries switch from one treaty model to the next. In the heat map changes are made visible as multiple red quadrangles within the same country’s network.
Alschner, W. and D. Skougarevskiy (2015). Consistency and Legal Innovation in the BIT Universe. Stanford Public Law and Legal Theory Working Paper Series No. 2595288. http://ssrn.com/abstract=2595288
This research has benefitted from the funding and support of the following grants and projects:
SNF Doc Mobility Grant
SNF Project “Convergence versus Divergence? Text-as-data and Network Analysis of International Economic Law Treaties and Tribunals”
SNIS Project “Diffusion of International Law: A Textual Analysis of International Investment Agreements”
NCCR Trade Regulation
From text to data
This website breaks new ground by mapping 1628 English language investment treaties – 51% of the BIT universe. For our TPP special we augment the data with Investment Chapter texts from 51 Free Trade Agreement, and 7 Multilateral Investment Treaties.
Second, we manually edit these texts, remove side-letters and schedules of reservations and correct typos, optical character recognition errors, and other mistakes in underlying data sources. We also unify the treaty spelling, converting all British English words to their American English counterparts (e.g. “favour” to “favor”).
Third, we transform text into data by computing treaty differences. Every treaty is dissected into its constituent 5-character substrings (5-grams) preserving word order:
Suppose there are two treaties: one containing a phrase “shall not be permitted” and second containing a phrase “shall be permitted”. Their 5-character substrings will be:
“shall not be permitted”
“shall be permitted”
We then compare these treaties by the extent to which they share 5-character substrings. The above two treaties have 21 unique 5-grams, of which 11 feature in both treaties, or 52%. Substracting this figure from 100% yields 48%, a measure of dissimilarity between two treaties. Formally, this is known as the Jaccard distance, which ranges between 0 and 1:
Identical BITs: all 5-character substrings will be in both BITs yielding a perfect similarity coefficient of 0
Completely dissimilar BITs: no 5-character substring is shared between two BITs; the similarity coefficient will be 1, which represents the maximum distance between two treaties
The resulting 1628×1628 distance matrix stores all bilateral similarities.
Fifth, we apply Affinity propagation clustering to the dissimilarity matrix in order to uncover structure in the BIT universe. We set the preference q (the a priori suitability of a point to serve as an exemplar) to the lowest quantile of the similarities (minimum similarity). That produces 94 clusters.
Sixth, we come up with a way to interactively display the dissimilarities, on a heat map. It compares BITs based on their similarities (dark red) and differences (bright yellow).
Seventh, for each treaty we locate its 20 closest neighbours in terms of Jaccard distance. We also uncover the cleavages between treaties in a pair by comparing word usage. To this end, we construct a document-term matrix from the treaties (stop words are removed, British spelling differences are mitigated with VarCon). With the aid of the document-term matrix we compute proportions of word use for each word in each treaty. Applying the two-proportion z-test, we find the words that are 10% significantly differently used in the selected treaty pair and display them in an interactive chart.
Eighth, we examine the subsets of treaties by country and compute internal coherence of signed treaties for each country. This enables us to rank countries based on their treaty network coherence. We exclude countries that struck less than 4 treaties from this ranking to ensure that the results are not driven by treaty network size.
Wolfgang is a post-doctoral researcher at the World Trade Institute and at the Graduate Institute of International and Development Studies.
Before obtaining his Ph.D. at the Graduate Institute and JSM at Stanford Law School, Wolfgang worked for the Institute's International Law Department as Professor Joost Pauwelyn's research and teaching assistant. He was, amongst others, responsible for the courses ‘International Trade Law’, ‘International Investment Law’ and the ‘International Trade and Investment Law Clinic’. Wolfgang also acted repeatedly as academic advisor to the Graduate Institute WTO Summer Programme.
Dmitriy is a Ph.D. candidate at The Graduate Institute of International and Development Studies and a researcher at the Institute for the Rule of Law at the European University at St. Petersburg.
He works at the intersection of Law and Economics, applying economic analysis to various fields of law and seeking to answer legal questions with hard data. Dmitriy is active in sentencing research and economics of crime studies.
Country boundaries are from the U.S. Department of State's Office of the Geographer’s Large Scale International Boundary Lines and World Vector Shorelines Simplified World Polygons data set (as of March, 2013).