feat(ops/modules/www/git.snix.dev): block AI scrapers

This blocks a bunch of AI scrapers from Forgejo, which seems to be
particularly attractive.

Especially meta-externalagent has been scraping very excessively.

The list comes from https://github.com/ai-robots-txt/ai.robots.txt,
let's see how often this needs updating.

Change-Id: I55ae7c42c6a3eeff6f0457411a8b05d55cb24f65
Reviewed-on: https://cl.snix.dev/c/snix/+/30370
Autosubmit: Florian Klink <flokli@flokli.de>
Tested-by: besadii
Reviewed-by: edef <edef@edef.eu>
This commit is contained in:
Florian Klink 2025-05-01 16:47:17 +03:00 committed by clbot
parent c501361412
commit 853754d25f
2 changed files with 16 additions and 1 deletions

View file

@ -1,4 +1,4 @@
{ ... }:
{ depot, ... }:
{
imports = [
@ -10,9 +10,12 @@
serverName = "git.snix.dev";
enableACME = true;
forceSSL = true;
locations."=/robots.txt".alias = "${depot.third_party.sources.ai-robots-txt}/robots.txt";
locations."/" = {
proxyPass = "http://127.0.0.1:3000";
extraConfig = ''
include ${depot.third_party.sources.ai-robots-txt + "/nginx-block-ai-bots.conf"};
proxy_ssl_server_name on;
proxy_pass_header Authorization;