Can Semantic Multilingual Search Improve the Accessibility of Research Outputs Across Languages? A COAR Proposal

Today, COAR is published a paper presenting a very promising new approach for multilingual discovery, called multilingual semantic search.

Scholarly knowledge is created and shared through a wide range of sources — repositories, journals, data platforms, and other scholarly information systems — and in hundreds of languages. Yet, most discovery tools continue to privilege a few dominant languages, leaving large portions of research effectively invisible to global audiences. This paper explores the concept of semantic multilingual search: an emerging approach that retrieves information by meaning rather than by exact wording, enabling users to search in any supported language and discover relevant work across other languages. We believe that this approach, if applied to the scholarly knowledge commons, could significantly enhance the discoverability of research outputs in a very multilingual environment. It is important to note that this is not a translation technology, but rather a way of associating meaning to related content across different languages.

Instead of proposing a fixed technical design, the document invites the community to consider how semantic multilingual search could evolve within the broader ecosystem of scholarly communication. It reflects on early experiences, shared principles, and collective responsibilities to ensure that this new generation of discovery tools advances openness, equity, and linguistic diversity.

Check out how it works

To accompany the paper, we have also released a demonstration notebook that openly shares the code behind a simple proof of concept. This notebook illustrates, in a transparent and accessible way, how multilingual embeddings can represent meaning across languages and enable cross-language discovery. It is not a full application, but a minimal and open demonstration of the underlying technology — meant to encourage learning, experimentation, and community-based development.

The collective content in our global repository ecosystem is extremely diverse and multilingual and this is a very valuable and important attribute. As such, we are investigating the potential for this approach to be applied to greatly improve access to content in other languages in the context of the repository network.


Categories:

Discover more from COAR

Subscribe now to keep reading and get access to the full archive.

Continue reading