Open repositories are being profoundly impacted by AI bots and other crawlers: Report from a COAR Survey

Executive Summary

Every day, multiple bots access the repository at all hours 24/7. We estimate performance degradation due to bot activity about once or twice a day, and at least once a week the system crashes entirely requiring an intervention – typically a service restart.

Survey respondent

There are a growing number of AI bots crawling repositories. These automated bots, or crawlers, navigate the internet, gathering data and indexing information for search engines, AI and large language models, and other purposes. While some bots are rather innocuous, others are sufficiently aggressive that they are increasingly causing service disruptions in repositories (and other scholarly communications infrastructures). To learn more about the current state and gain a better understanding about the impact of bots and crawlers on repositories, COAR distributed a survey to members in April 2025. The survey received 66 responses from repositories around the world (22 from Canada and US, 22 from Europe, 9 from Latin America, 6 from Asia, 4 from Australasia, 2 from Africa, and 1 unknown).

Over 90% of survey respondents indicated their repository is encountering aggressive bots, usually more than once a week, and often leading to slow downs and service outages. While there is no way to be 100% certain of the purpose of these bots, the assumption in the community is that they are AI bots gathering data for generative AI training. This type of traffic has shown a marked increase in the last two years or so, and is having a considerable impact on repositories both in terms of the quality of service provision as well as the time and resources required to deal with the bots. In order to mitigate their impact, a variety of measures are being used to minimize or stop AI bots from accessing repositories. Some of the measures being used are considered to be relatively successful in protecting repositories from service disruptions, but it is also clear that they are impeding access to the repositories by other more welcome actors, such as individual human users and benign systems.

The underlying mission of repositories is to provide access to their collections so they are reused and repurposed for the good of scholarship and society. However, the recent rise in aggressive bots activity could potentially result in repositories limiting access to their resources for both human and machine users – leading to a situation where the value of the global repository network is substantially diminished. In order to help the repository community navigate this rapidly evolving landscape and develop solutions that allow repositories to remain as open as possible, COAR will be launching a “Repositories and AI Bots Task Force” in July of 2025. The Task Force will bring together technical representatives from repositories and other experts to discuss potential solutions to this problem and develop recommendations for the repository community.

Image credit: A Generative AI self-portrait by DALL·E. Via Wikimedia Commons


Categories:

Discover more from COAR

Subscribe now to keep reading and get access to the full archive.

Continue reading