snix/users/edef/fetchroots
Florian Klink a512f16424 chore(nix-compat): bump to nom 8.x
See 72dd5818b7/CHANGELOG.md
for the nom changelog.

Most notably, there's now a .parse() to be added:

`combinator(arg)(input)` -> `combinator(arg).parse(input)`

There also doesn't need to be a tuple combinator (it's implemented on
tuples directly).

This also refactors the string / byte field parsing parts, to make them
more concise.

Change-Id: I9e8a3cedd07d6705be391898eb6a486fb8164069
Reviewed-on: https://cl.tvl.fyi/c/depot/+/13193
Tested-by: BuildkiteCI
Reviewed-by: edef <edef@edef.eu>
Reviewed-by: Brian Olsen <me@griff.name>
2025-03-04 08:17:05 +00:00
..
src feat(users/edef/fetchroots): init 2024-10-17 16:40:38 +00:00
.gitignore chore(users/edef/fetchroots): wire up the build 2024-10-17 16:41:03 +00:00
Cargo.lock chore(nix-compat): bump to nom 8.x 2025-03-04 08:17:05 +00:00
Cargo.nix chore(nix-compat): bump to nom 8.x 2025-03-04 08:17:05 +00:00
Cargo.toml feat(users/edef/fetchroots): init 2024-10-17 16:40:38 +00:00
default.nix chore(users/edef/fetchroots): wire up the build 2024-10-17 16:41:03 +00:00
README.md docs(users/edef/fetchroots): add a README for other users 2024-10-17 16:41:03 +00:00

fetchroots

This tool is part of a suite of tools built to manage cache.nixos.org.

This tool's purpose is to build an index of all the GC roots from the channels.nixos.org releases. The result is then combined with other tools.

It does this by:

  1. Listing all the release files in the bucket.
  2. Getting the data for each of the release.
  3. Putting them in a local parquet file.

Getting started

In order to run this, you'll need AWS SSO credentials from the NixOS Infra team.

Get the creds from https://nixos.awsapps.com/start/ -> LBNixOS_Dev_PDX -> AWSReadOnlyAccess.

Run mg run, you should see a progress bar.

Congrats, you now have a roots.parquet file. You can now load it with python polars-rs or clickhouse.

roots.parquet file format

  • key (String): the release, eg nixos/22.11-small/nixos-22.11.513.563dc6476b8
  • timestamp (DateTime): the timestamp of the GC roots file for this release
  • store_path_hash (List[Binary]): hash part of the store paths rooted by this release

Development

When the Cargo.lock changes, run mg run //tools:crate2nix-generate.

To build the project, run mg build.

To get a dev environment, run nix-shell -p cargo.