We have deployed Anubis to git.sr.ht.
After some internal discussions we have ultimately decided that the best course
of action to protect git.sr.ht from LLM crawlers is to deploy Anubis. This
software presents some users with a proof-of-work challenge which is solved by
the user’s browser with JavaScript.
This challenge is automatically bypassed for logged-in users. If your browser
does not support JavaScript (or you do not wish to enable it for any other
reason), log in at meta.sr.ht to circumvent it.
Note that Anubis is only being used for the web frontend. API access and git
operations are unaffected.
This solution is robust and reliable. We do not want to leave it enabled
indefinitely, but considering that the user impact is minimal and it is
sufficient to mitigate the LLM traffic, we consider the matter closed and are
closing this notice. Thank you for your patience while we prepared our
mitigations.
(09:00 UTC — Mar 24)
SourceHut continues to face disruptions due to aggressive LLM crawlers.
We are continuously working to deploy mitigations. We have deployed a number of
mitigations which are keeping the problem contained for now. However, some of
our mitigations may impact end-users.
In particular, we have deployed Nepenthes to certain routes which are
associated with large volumes of LLM-related traffic. You may encounter certain
pages which are not usable as a result, especially if you are not logged in.
Mitigations only affect the web frontend of SourceHut: SSH access, git
operations, API access, and so on, should behave normally.
We understand that some of our mitigations are user-impacting. We apologize for
the inconvenience. These measures are temporary, but we do not have an estimate
for when they will no longer be required. To be honest, we are running out of
ideas for how to deal with these LLM bots. Your patience is appreciated.
If you are having problems using the SourceHut web UI:
First, log into your SourceHut account. Logged-in users bypass most of our
mitigations. If that does not work, please contact support on IRC or via
email.
If your cloud server is unable to reach SourceHut:
We have unilaterally blocked several cloud providers, including GCP and Azure,
for the high volumes of bot traffic originating from their networks. If your
cloud server is experiencing problems using SourceHut, and you have a legitimate
reason to do so, you must email support to request an exception.
Please explain your use-case and include a list of affected IPs and/or subnets.
We kindly ask the administrators of SourceHut integrations to program their
software with responsible usage patterns. If possible, we request that you
prefer webhooks over polling for updates. If your integration performs git
operations, please prefer to use git fetch to update a persistent repository, or
use a shallow git clone, rather than performing a fresh clone each time your
automation runs. We also request that you set a User-Agent string for your
traffic which identifies your software and includes an email address that we can
contact with questions and feedback, as well as clearly identifying your traffic
as non-malicious so we do not mistakenly apply mitigations to you.
If you are using git(1) for git operations, you can set a User-Agent by setting
the GIT_HTTP_USER_AGENT environment varaible accordignly.
If you would like advice on making your integration more efficient, or setting
up webhooks, please contact support for assistance.
(08:30 UTC — Mar 17)