Multilingualism is a critical characteristic of a healthy, inclusive, and diverse research communications landscape. Publishing in a local language ensures that the public in different countries has access to the research they fund, and also levels the playing field for researchers who speak different languages. The Helsinki Initiative on Multilingualism in Scholarly Communication asserts that the disqualification of local or national languages in academic publishing is the most important – and often forgotten – factor that prevents societies from using and taking advantage of the research done where they live.
Multilingualism presents a particular challenge for the discovery of research outputs. Although researchers and other information seekers may only be able to read in one or two languages, they want to know about all the relevant research in their area, regardless of the language in which it is published. Yet, discovery systems such as Google Scholar and other scholarly indexes tend to provide access only to the content available in the language of the user. In addition, the language of a scholarly resource is often not labelled appropriately, meaning a large portion of non-English resources are excluded from search results. Furthermore, many scholarly communications infrastructures are sub-optimal in their support for a variety of languages since little attention was paid to this issue during their design process.
On October 30, the COAR Task Force on Supporting Multilingualism and non-English Content in Repositories published 15 recommended practices for repositories to support multilingual and non-English content in repositories. The recommendations identify good practices for repository managers and repository software developers, and focus on the topics of metadata, multilingual keywords, user interfaces, formats, and licences that will improve the visibility, discovery and reuse of repository content in a variety of languages. Read the report.
Summary of Recommendations
- Declare the language of the resource at the item level
- Declare the language of the metadata (e.g. xml:lang attribute)
- Use standard (two-letter or three-letter) language codes (ISO 639)
- Enable UTF-8 support in your repository and use the original alphabet / the writing system whenever possible. If it is necessary to transliterate metadata, use recognized standards (e.g. ISO)
- If the repository software supports multiple interface languages, set up the user interface in the native language(s) of the target group, along with that in English
- Write personal name/s using the writing system used in the deposited document and provide a persistent identifier enabling unambiguous identification, such as ORCID
- Include keywords in many languages, use multilingual vocabularies and thesauri if possible
- Recommendations for managing translated content
- Ensure that language codes can consistently be used across the repository collections
- Expose the language of metadata via metadata exchange protocol, e.g. OAI-PMH, GraphQL API, etc.
- Improve support for ISO language codes, e.g. three-letter codes needed for some languages
- Ensure that persistent identifiers are exposed via OAI-PMH
- Provide support for multilingual keywords to increase the discoverability of multilingual repository content
- Enable automatic assignment of controlled terms based on the existing metadata
COAR Task Force on Supporting Multilingualism and non-English Content in Repositories
Jagadish Aryal, Social Science Baha, Nepal
Aysen Binen, Izmir Institute of Technology İYTE, Türkiye
Andreas Czerniak, Bielefeld University – Library, Germany
Claudia Córdova Yamauchi, CONCYTEC, Peru
Christophe Dony, ULiège Library,Belgium
Joe Cera, Berkeley Law Library, University of California, USA
Sebastiano Giorgi-Scalari, Open University of Catalonia, Spain
Gussun Gnes, Marmara University Libraries, Türkiye
Gultekin Gurdal, Izmir Institute of Technology İYTE, Türkiye
Johanna Havemann, AfricArXiv, Germany
Nie Hua, Peking University, China
Libio Huaroto Pajuelo, Universidad Peruana de Ciencias Aplicadas, Peru
Alan Ku (Gu Liping), National Science Library, Chinese Academy of Sciences, China
Iryna Kuchma, EIFL (chair), Lithuania
Pierre Lasou, Bibliothèque de l’Université Laval, Canada
Norma Aída Manzanera Silva, Centro de Investigaciones sobre América del Norte, Universidad Nacional Autónoma de México
Lautaro Matas, LA Referencia, Spain/Latin America
Ayako Mikami, Hokkaido University, Japan
Tomoki Nagase, National Institute of Informatics, Japan
Andrea Mora Campos, University of Costa Rica, Costa Rica
Tomasz Neugebauer, Concordia University, Canada
Jean-Francois Nomine, INIST, France
Milica Sevkusic, ITS SASA, Serbia
Kathleen Shearer, COAR, Canada
Freddy Sumba, CEDIA, Ecuador
Ben Trettel, Translate Science
- Managing multilingual and non-English language content in repositories
- COAR community consultation on managing non-English and multilingual content in repositories
- Is there a case for accepting machine translated scholarly content in repositories?
- COAR Announces first recommendation for supporting multilingual and non-English content in repositories