Multilingual and Non-English Content

Multilingualism is a critical characteristic of a healthy, inclusive, and diverse research communications landscape. Publishing in a local language ensures that the public in different countries has access to the research they fund, and also levels the playing field for researchers who speak different languages. The Helsinki Initiative on Multilingualism in Scholarly Communication asserts that the disqualification of local or national languages in academic publishing is the most important – and often forgotten – factor that prevents societies from using and taking advantage of the research done where they live.

Multilingualism presents a particular challenge for the discovery of research outputs. Although researchers and other information seekers may only be able to read in one or two languages, they want to know about all the relevant research in their area, regardless of the language in which it is published. Yet, discovery systems such as Google Scholar and other scholarly indexes tend to provide access only to the content available in the language of the user. In addition, the language of a scholarly resource is often not labelled appropriately, meaning a large portion of non-English resources are excluded from search results. Furthermore, many scholarly communications infrastructures are sub-optimal in their support for a variety of languages since little attention was paid to this issue during their design process.

On October 30, the COAR Task Force on Supporting Multilingualism and non-English Content in Repositories published 15 recommended practices for repositories to support multilingual and non-English content in repositories.  The recommendations identify good practices for repository managers and repository software developers, and focus on the topics of metadata, multilingual keywords, user interfaces, formats, and licences that will improve the visibility, discovery and reuse of repository content in a variety of languages. Read the report.

Summary of Recommendations
  1. Declare the language of the resource at the item level
  2. Declare the language of the metadata (e.g. xml:lang attribute)
  3. Use standard (two-letter or three-letter) language codes (ISO 639)
  4. Enable UTF-8 support in your repository and use the original alphabet / the writing system whenever possible. If it is necessary to transliterate metadata, use recognized standards (e.g. ISO)
  5. If the repository software supports multiple interface languages, set up the user interface in the native language(s) of the target group, along with that in English
  6. Write personal name/s using the writing system used in the deposited document and provide a persistent identifier enabling unambiguous identification, such as ORCID
  7. Include keywords in many languages, use multilingual vocabularies and thesauri if possible
  8. Recommendations for managing translated content
  9. Ensure that language codes can consistently be used across the repository collections
  10. Expose the language of metadata via metadata exchange protocol, e.g. OAI-PMH, GraphQL API, etc.
  11. Improve support for ISO language codes, e.g. three-letter codes needed for some languages
  12. Ensure that persistent identifiers are exposed via OAI-PMH
  13. Provide support for multilingual keywords to increase the discoverability of multilingual repository content
  14. Enable automatic assignment of controlled terms based on the existing metadata