snix/ops/modules/www/git.snix.dev.nix
Florian Klink 853754d25f feat(ops/modules/www/git.snix.dev): block AI scrapers
This blocks a bunch of AI scrapers from Forgejo, which seems to be
particularly attractive.

Especially meta-externalagent has been scraping very excessively.

The list comes from https://github.com/ai-robots-txt/ai.robots.txt,
let's see how often this needs updating.

Change-Id: I55ae7c42c6a3eeff6f0457411a8b05d55cb24f65
Reviewed-on: https://cl.snix.dev/c/snix/+/30370
Autosubmit: Florian Klink <flokli@flokli.de>
Tested-by: besadii
Reviewed-by: edef <edef@edef.eu>
2025-05-01 14:57:44 +00:00

29 lines
735 B
Nix

{ depot, ... }:
{
imports = [
./base.nix
];
config = {
services.nginx.virtualHosts.forgejo = {
serverName = "git.snix.dev";
enableACME = true;
forceSSL = true;
locations."=/robots.txt".alias = "${depot.third_party.sources.ai-robots-txt}/robots.txt";
locations."/" = {
proxyPass = "http://127.0.0.1:3000";
extraConfig = ''
include ${depot.third_party.sources.ai-robots-txt + "/nginx-block-ai-bots.conf"};
proxy_ssl_server_name on;
proxy_pass_header Authorization;
# This has to be sufficiently large for uploading layers of
# non-broken docker images.
client_max_body_size 1G;
'';
};
};
};
}