Data availability
The datasets are available at Harvard’s Dataverse within the Atlas of Economic Complexity page at URL https://dataverse.harvard.edu/dataverse/atlas. The datasets include: Weighted Classification Conversion Tables3 (https://doi.org/10.7910/DVN/6AADMR) and Bilateral Trade Data Aggregated by Year4 ([https://doi.org/10.7910/DVN/5NGVOB]…
Data availability
The datasets are available at Harvard’s Dataverse within the Atlas of Economic Complexity page at URL https://dataverse.harvard.edu/dataverse/atlas. The datasets include: Weighted Classification Conversion Tables3 (https://doi.org/10.7910/DVN/6AADMR) and Bilateral Trade Data Aggregated by Year4 (https://doi.org/10.7910/DVN/5NGVOB).
Code availability
The code used to acquire data from Comtrade, generate conversion weights, and mirror the bilateral trade data, respectively, is available for public use via GitHub in the following repositories:• https://github.com/harvard-growth-lab/comtrade-downloader15• https://github.com/harvard-growth-lab/comtrade-conversion-weights48• https://github.com/harvard-growth-lab/comtrade-mirroring49
While UN Comtrade provides open access to all data used, these packages make use of bulk downloading features available with a premium API subscription through the UN Comtrade API Package17. These packages enable anyone to reproduce these datasets with the most recent data available from Comtrade in SITC and HS classification vintages. Our method is designed to accommodate new classification vintages. When the WCO releases a new classification as scheduled in 2027, our code will handle the new concordance with minimal modifications. Each of these packages are written primarily in Python, with some functionality implemented in R and Matlab.
References
UN Department of Economic and Social Affairs; Statistics Division. Classifications on economic statistics https://unstats.un.org/unsd/classifications/Econ/ (2022). 1.
Lukaszuk, P. & Torun, D. Harmonizing the harmonized system. SEPS Discussion Paper 2022-12, SEPS (2022). 1.
Harvard Growth Lab. Weighted Classification Conversion Tables https://doi.org/10.7910/DVN/6AADMR (2025). 1.
Harvard Growth Lab. Bilateral Trade Data Aggregated by Year https://doi.org/10.7910/DVN/5NGVOB (2025). 1.
Bustos, S. & Yildirim, M. A. Uncovering trade flows. Unpublished Mimeo (2024). 1.
Hausmann, R., Stock, D. P. & Yildirim, M. A. Implied comparative advantage. Research Policy 51, 104143 (2022).
O’Clery, N., Yildirim, M. A. & Hausmann, R. Productive ecosystems and the arrow of development. Nature Communications 12, 1479 (2021).
Bustos, S. & Yildirim, M. A. Production ability and economic growth. Research Policy 51, 104153 (2022).
Bustos, S. & Morales-Arilla, J. Gains from globalization and economic nationalism: Amlo versus nafta in the 2006 mexican elections. Economics & Politics 36, 202–244 (2024).
Hausmann, R., Schetter, U. & Yildirim, M. A. On the design of effective sanctions: The case of bans on exports to russia. Economic Policy 39, 109–153 (2024).
Egger, P., Foellmi, R., Schetter, U. & Torun, D. Gravity with History: On Incumbency Effects in International Trade. Journal of the European Economic Association jvae052 (2024). 1.
Head, K. & Mayer, T. Gravity Equations: Workhorse, Toolkit, and Cookbook. In Handbook of International Economics, vol. 4, 131–195 (Elsevier, 2014). 1.
Anderson, J. E. & Van Wincoop, E. Gravity with gravitas: A solution to the border puzzle. American Economic Review 93, 170–192 (2003).
Yotov, Y. V. Gravity at sixty: the workhorse model of trade (2022). 1.
Harvard Growth Lab. Comtrade downloader. https://github.com/harvard-growth-lab/comtrade-downloader (2025). 1.
United Nations Statistics Division. Un comtrade database. https://comtradeplus.un.org/TradeFlow. Annual commodity data requested in originally reported classification for SITC and HS classifications and all countries (1962-2023). Accessed: March 26, 2025. 1.
untradestats. comtradeapicall [software]. https://pypi.org/project/comtradeapicall/ (2024). 1.
Mayer, T. & Zignago, S. Notes on CEPII’s distances measures: The GeoDist database. Tech. Rep. https://www.cepii.fr/CEPII/en/bdd_modele/bdd_modele_item.asp?id=6 (2011). 1.
International Monetary Fund. World economic outlook database, april 2025 edition https://www.imf.org/en/Publications/WEO/weo-database/2025/april (2025). 1.
World Bank. World development indicators: Population https://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL (2025). 1.
U.S. Bureau of Labor Statistics. Producer price index by commodity: Industrial commodities [ppiidc] https://fred.stlouisfed.org/series/PPIIDC (2025). 1.
United Nations Statistics Division. Complete correlations among hs, sitc and bec https://unstats.un.org/unsd/classifications/Econ (2022). 1.
Allen, R. L. & Berliner, J. S. Soviet economic warfare (1961). 1.
Bhagwati, J. On the underinvoicing of imports. Oxford Bulletin of Economics and Statistics 27, 389–397 (1964).
Naya, S. & Morgan, T. The accuracy of international trade data: the case of southeast asian countries. Journal of the American Statistical Association 64, 452–467 (1969).
Sheikh, M. A. Smuggling, production and welfare. Journal of International Economics 4, 355–364 (1974).
Yeats, A. J. On the accuracy of partner country trade statistics. Oxford Bulletin of Economics and Statistics 40, 341–361 (1978).
McDonald, D. C. Trade data discrepancies and the incentive to smuggle: An empirical analysis. Staff Papers 32, 668–692 (1985).
Yeats, A. J. On the accuracy of economic observations: Do sub-saharan trade statistics mean anything? The World Bank Economic Review 4, 135–156 (1990).
Rozanski, J. & Yeats, A. On the (in) accuracy of economic observations: An assessment of trends in the reliability of international trade statistics. Journal of Development Economics 44, 103–130 (1994).
Gehlhar, M. Reconciling bilateral trade data for use in gtap. Tech. Rep., GTAP Technical Papers (1996). 1.
Makhoul, B. & Otterstrom, S. M. Exploring the accuracy of international trade statistics. Applied Economics 30, 1603–1616 (1998).
Pohit, S. & Taneja, N. India’s informal trade with bangladesh: A qualitative assessment. The World Economy 26, 1187–1214 (2003).
Beja, E. L. Estimating trade mis-invoicing from china: 2000–2005. China & World Economy 16, 82–92 (2008).
Ferrantino, M. J. & Zhi, W. Accounting for discrepancies in bilateral trade: The case of china, hong kong, and the united states. China Economic Review 19, 502–520 (2008).
Barbieri, K., Keshk, O. M. & Pollins, B. M. Trading data: Evaluating our assumptions and coding rules. Conflict Management and Peace Science 26, 471–491 (2009).
Gaulier, G. & Zignago, S. Baci: International trade database at the product-level the 1994-2007 version (2010). 1.
Dong, G.Mirror statistics of international trade in manufacturing goods: The case of China (United Nations Industrial Development Organization, 2010). 1.
Hamanaka, S. et al. Usable data for economic policymaking and research? the case of lao pdr’s trade statistics. Asia-Pacific Research and Training Network on Trade (ARTNeT) (2010). 1.
Hamanaka, S. Whose trade statistics are correct? multiple mirror comparison techniques: A test case of cambodia. Journal of Economic Policy Reform 15, 33–56 (2012).
Ferrantino, M. J., Liu, X. & Wang, Z. Evasion behaviors of exporters and importers: Evidence from the us–china trade data discrepancy. Journal of International Economics 86, 141–157 (2012).
Pierce, J. R. & Schott, P. K. Concording U.S. Harmonized System Codes Over Time. Journal of Official Statistics 28, 53–68 (2012).
Cebeci, T. A Concordance among Harmonized System 1996, 2002 and 2007 Classifications. World Bank Working Papers, No. 74576 (2012). 1.
Diodato, D. A Network-based Method to Harmonize Data Classifications. Papers in Evolutionary Economic Geography (2018). 1.
Bellert, N. & Fauceglia, D. A Practical Routine to Harmonize Product Classifications over Time. International Economics 160, 84–89 (2019).
International Monetary Fund. World economic outlook https://www.imf.org/en/Publications/SPROLLs/world-economic-outlook-databases (2025). 1.
International Monetary Fund. Statistics Dept. External debt statistics: Guide for compilers and users (International Monetary Fund, 2014). 1.
Harvard Growth Lab. Comtrade conversion weights generator. https://github.com/harvard-growth-lab/comtrade-conversion-weights (2025). 1.
Harvard Growth Lab. Comtrade mirroring pipeline. https://github.com/harvard-growth-lab/comtrade-mirroring (2025).
Acknowledgements
We would like to thank Timothy P. Cheston for help with the data validation and contributions to the Atlas of Economic Complexity. We would like to thank Mali Akmanalp and Romain Vuillemot for working on earlier versions of the Atlas data and architecture. We would like to thank the current and former members of the Harvard Growth Lab for their continuous feedback on the data. David Torun gratefully acknowledges financial support from the Swiss National Science Foundation under Ambizione Grant 233238. The views expressed in this study are the authors’ and do not reflect those of SECO.
Author information
Author notes
These authors contributed equally: Sebastián Bustos, Ellie Jackson, David Torun.
Authors and Affiliations
The Growth Lab, Center for International Development – Harvard University, Cambridge, MA, US
Sebastián Bustos, Ellie Jackson, Brendan Leonard, Nil Tuzcu, Annie White, Ricardo Hausmann & Muhammed A. Yıldırım 1.
Department of Economics, University of Zurich, Zurich, Switzerland
David Torun 1.
State Secretariat for Economic Affairs (SECO), Bern, Switzerland
Piotr Lukaszuk 1.
Santa Fe Institute, Santa Fe, NM, US
Ricardo Hausmann 1.
Department of Economics, College of Administrative Sciences and Economics - Koç University, Istanbul, Turkey
Muhammed A. Yıldırım
Authors
- Sebastián Bustos
- Ellie Jackson
- David Torun
- Brendan Leonard
- Nil Tuzcu
- Piotr Lukaszuk
- Annie White
- Ricardo Hausmann
- Muhammed A. Yıldırım
Contributions
Conceptualization: S.B., M.A.Y., R.H. Methodology: E.J., S.B., M.A.Y., D.T., P.L. Formal Analysis: S.B., E.J., D.T., M.A.Y. Investigation: S.B., E.J., D.T. Data Curation: S.B., E.J., D.T., B.L. Software: S.B., E.J., D.T., B.L. Validation: S.B., E.J., B.L. Visualization: S.B., E.J., N.T., B.L. Writing - Original Draft: S.B., E.J., D.T., M.A.Y. Writing - Review & Editing: S.B., E.J., D.T., B.L., M.A.Y., A.W. Project Administration: E.J. Supervision: M.A.Y., A.W., R.H. Funding Acquisition: R.H.
Corresponding authors
Correspondence to Ricardo Hausmann or Muhammed A. Yıldırım.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bustos, S., Jackson, E., Torun, D. et al. Tackling Discrepancies in Trade Data: The Harvard Growth Lab International Trade Datasets. Sci Data (2026). https://doi.org/10.1038/s41597-025-06488-2
Received: 22 July 2025
Accepted: 17 December 2025
Published: 22 January 2026
DOI: https://doi.org/10.1038/s41597-025-06488-2