Multilingual and Non-English Content

Multilingualism is a critical characteristic of a healthy, inclusive, and diverse research communications landscape. Publishing in a local language ensures that the public in different countries has access to the research they fund, and also levels the playing field for researchers who speak different languages. The Helsinki Initiative on Multilingualism in Scholarly Communication asserts that the disqualification of local or national languages in academic publishing is the most important – and often forgotten – factor that prevents societies from using and taking advantage of the research done where they live.

Every day, researchers around the world produce knowledge in hundreds of languages — Spanish in Argentina, Portuguese in Brazil, Arabic in Egypt, Japanese in Japan, Swahili in Kenya. This linguistic diversity is not a side note; it is the lifeblood of global scholarship. And yet, when we go looking for that knowledge, the tools at our disposal behave as if only a handful of languages truly matter. This multilingualism presents a particular challenge for the discovery of research outputs in repositories. Although researchers and other information seekers may only be able to read in one or two languages, they want to know about all the relevant research in their area, regardless of the language in which it is published. Yet, discovery systems such as Google Scholar and other scholarly indexes tend to provide access only to the content available in the language of the user.

Collectively, repositories hold content representing hundreds of different languages. This multilingualism is an extremely valuable attribute of the global repository network. To that end, COAR is developing and promoting practices and models that improve and advance multilingualism across the repository network.

Breaking language barriers in science through semantic multilingual search – What if search worked differently? What if you could type a query in your own language — cambio climático, énergies renouvelables, 再生可能エネルギー — and find relevant results in English, French, Spanish, Japanese, or beyond, without ever translating a word? In June 2025, COAR launched a project to investigate the potential of semantic multilingual searching in the context of scholarly literature and develop a proposed conceptual model that could apply this technology in repositories and their full text aggregations. Read more in the recent blog post and stay tuned for our community consultation in November 2025.


Good Practice Advice for Managing Multilingual and non-English Language Content in Repositories – In October 2023, The COAR Task Force on Supporting Multilingualism and Non-English Content in Repositories published a report with several recommended practices for repository managers and repository software developers to better support multilingual and non-English content. The recommendations focus on the topics of metadata, multilingual keywords, user interfaces, formats, and licences that will improve the visibility, discovery and reuse of repository content in a variety of languages.

Read the Recommendations
  1. Declare the language of the resource at the item level
  2. Declare the language of the metadata (e.g. xml:lang attribute)
  3. Use standard (two-letter or three-letter) language codes (ISO 639)
  4. Enable UTF-8 support in your repository and use the original alphabet / the writing system whenever possible. If it is necessary to transliterate metadata, use recognized standards (e.g. ISO)
  5. If the repository software supports multiple interface languages, set up the user interface in the native language(s) of the target group, along with that in English
  6. Include keywords in many languages, use multilingual vocabularies and thesauri if possible
  7. Write personal name/s using the writing system used in the deposited document and provide a persistent identifier enabling unambiguous identification, such as ORCID
  8. Ensure that language codes can consistently be used across the repository collections
  9. Expose the language of metadata via metadata exchange protocol, e.g. OAI-PMH, GraphQL API, etc.
  10. Improve support for ISO language codes, e.g. three-letter codes needed for some languages
  11. Ensure that persistent identifiers are exposed via OAI-PMH
  12. Provide support for multilingual keywords to increase the discoverability of multilingual repository content
  13. Enable automatic assignment of controlled terms based on the existing metadata

COAR Task Force on Supporting Multilingualism and Non-English Content in Repositories

  • Iryna Kuchma, EIFL (chair), Lithuania
  • Jagadish Aryal, Social Science Baha, Nepal
  • Aysen Binen, Izmir Institute of Technology İYTE, Türkiye
  • Andreas Czerniak, Bielefeld University – Library, Germany
  • Claudia Córdova Yamauchi, CONCYTEC, Peru
  • Christophe Dony, ULiège Library,Belgium
  • Joe Cera, Berkeley Law Library, University of California, USA
  • Sebastiano Giorgi-Scalari, Open University of Catalonia, Spain
  • Gussun Gnes, Marmara University Libraries, Türkiye
  • Gultekin Gurdal, Izmir Institute of Technology İYTE, Türkiye
  • Johanna Havemann, AfricArXiv, Germany
  • Nie Hua, Peking University, China
  • Libio Huaroto Pajuelo, Universidad Peruana de Ciencias Aplicadas, Peru
  • Alan Ku (Gu Liping), National Science Library, Chinese Academy of Sciences, China

  • Pierre Lasou, Bibliothèque de l’Université Laval, Canada
  • Norma Aída Manzanera Silva, Centro de Investigaciones sobre América del Norte, Universidad Nacional Autónoma de México
  • Lautaro Matas, LA Referencia, Spain/Latin America
  • Ayako Mikami, Hokkaido University, Japan
  • Tomoki Nagase, National Institute of Informatics, Japan
  • Andrea Mora Campos, University of Costa Rica, Costa Rica
  • Tomasz Neugebauer, Concordia University, Canada
  • Jean-Francois Nomine, INIST, France
  • Milica Sevkusic, ITS SASA, Serbia
  • Kathleen Shearer, COAR, Canada
  • Freddy Sumba, CEDIA, Ecuador
  • Ben Trettel, Translate Science