Can Semantic Multilingual Search Improve the Accessibility of Research Outputs Across Languages? A COAR Proposal

Today, COAR is published a paper presenting a very promising new approach for multilingual discovery, called multilingual semantic search. We want your opinion!

Scholarly knowledge is created and shared through a wide range of sources — repositories, journals, data platforms, and other scholarly information systems — and in hundreds of languages. Yet, most discovery tools continue to privilege a few dominant languages, leaving large portions of research effectively invisible to global audiences. This paper explores the concept of semantic multilingual search: an emerging approach that retrieves information by meaning rather than by exact wording, enabling users to search in any supported language and discover relevant work across other languages. We believe that this approach, if applied to the scholarly knowledge commons, could significantly enhance the discoverability of research outputs in a very multilingual environment. It is important to note that this is not a translation technology, but rather a way of associating meaning to related content across different languages.

Instead of proposing a fixed technical design, the document invites the community to consider how semantic multilingual search could evolve within the broader ecosystem of scholarly communication. It reflects on early experiences, shared principles, and collective responsibilities to ensure that this new generation of discovery tools advances openness, equity, and linguistic diversity.

To accompany the paper, we have also released a demonstration notebook that openly shares the code behind a simple proof of concept. This notebook illustrates, in a transparent and accessible way, how multilingual embeddings can represent meaning across languages and enable cross-language discovery. It is not a full application, but a minimal and open demonstration of the underlying technology — meant to encourage learning, experimentation, and community-based development.

Check out how it works

The collective content in our global repository ecosystem is extremely diverse and multilingual and this is a very valuable and important attribute. As such, we are investigating the potential for this approach to be applied to greatly improve access to content in other languages in the context of the repository network.

COAR is actively seeking input from our members and the broader community about this conceptual model. Based on this feedback, COAR will consider the possibility of launching a project to investigate how best to apply this type of approach in the repository network. 

From November 6 to December 19, 2025 we are inviting you to provide feedback on the usefulness and feasibility of the semantic multilingual search model as applied to the global repository network and the scholarly commons in general. Your feedback is critical to help guide our future work in this area.

The paper will be available in Spanish and Portuguese shortly.


Categories:

Discover more from COAR

Subscribe now to keep reading and get access to the full archive.

Continue reading