docs(web/snix/castore): port "why not git" document

Change-Id: I4ac77f264d2704018cd33dbf80746e82d193686e
Reviewed-on: https://cl.snix.dev/c/snix/+/30315
Reviewed-by: Domen Kožar <domen@cachix.org>
Tested-by: besadii
Autosubmit: Florian Klink <flokli@flokli.de>
This commit is contained in:
Florian Klink 2025-04-12 19:38:28 +02:00 committed by clbot
parent b2d2d622e0
commit c9082a586c
3 changed files with 23 additions and 15 deletions

View file

@ -21,7 +21,6 @@
- [Store API](./store/api.md) - [Store API](./store/api.md)
- [BlobStore Chunking](./castore/blobstore-chunking.md) - [BlobStore Chunking](./castore/blobstore-chunking.md)
- [BlobStore Protocol](./castore/blobstore-protocol.md) - [BlobStore Protocol](./castore/blobstore-protocol.md)
- [Why not git trees?](./castore/why-not-git-trees.md)
# Nix # Nix
- [Specification of the Nix Language](./language-spec.md) - [Specification of the Nix Language](./language-spec.md)

View file

@ -82,7 +82,7 @@ name MUST be unique across all three lists.
[rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html [rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html
[rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html [rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html
[BLAKE3]: https://github.com/BLAKE3-team/BLAKE3 [BLAKE3]: https://github.com/BLAKE3-team/BLAKE3
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see TODO LINK) [^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see [here]({{< relref "why-not-git.md" >}}))
[^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf [^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf
serialization of the `proto::Directory` struct to calculate serialization of the `proto::Directory` struct to calculate
these digests. While pretty stable across most these digests. While pretty stable across most

View file

@ -1,16 +1,23 @@
## Why not git tree objects? ---
title: "Why not Git?"
summary: ""
date: 2025-04-04T16:16:37+00:00
lastmod: 2025-04-04T16:16:37+00:00
draft: false
weight: 42
toc: false
---
We've been experimenting with (some variations of) the git tree and object We've been experimenting with (some variations of) the git tree and object
format, and ultimately decided against using it as an internal format, and format, and ultimately decided against using it as an internal format, and
instead adapted the one documented in the other documents here. instead adapted our own [Data Model][castore-data-model].
While the tvix-store API protocol shares some similarities with the format used While castore shares some similarities with the format used in git for trees and
in git for trees and objects, the git one has shown some significant objects, the git one has shown some significant disadvantages:
disadvantages:
### The binary encoding itself ### The binary encoding itself
#### trees #### git trees
The git tree object format is a very binary, error-prone and The git tree object format is a very binary, error-prone and
"made-to-be-read-and-written-from-C" format. "made-to-be-read-and-written-from-C" format.
@ -20,22 +27,23 @@ tree object in this encoding.
Extensions of the format/changes are very hard to do right, because parsers are Extensions of the format/changes are very hard to do right, because parsers are
not aware they might be parsing something different. not aware they might be parsing something different.
The tvix-store protocol uses a canonical protobuf serialization, and uses The [Snix Castore Data Model][castore-data-model] uses a canonical protobuf
the [blake3][blake3] hash of that serialization to point to other `Directory` serialization, and uses the [blake3][blake3] hash of that serialization to point
messages. to other `Directory` messages.
It's both compact and with a wide range of libraries for encoders and decoders It's both compact and with a wide range of libraries for encoders and decoders
in many programming languages. in many programming languages.
The choice of protobuf makes it easy to add new fields, and make old clients The choice of protobuf makes it easy to add new fields, and make old clients
aware of some unknown fields being detected [^adding-fields]. aware of some unknown fields being detected [^adding-fields].
#### blob #### git blob
On disk, git blob objects start with a "blob" prefix, then the size of the On disk, git blob objects start with a "blob" prefix, then the size of the
payload, and then the data itself. The hash of a blob is the literal sha1sum payload, and then the data itself. The hash of a blob is the literal sha1sum
over all of this - which makes it something very git specific to request for. over all of this - which makes it something very git specific to request for.
tvix-store simply uses the [blake3][blake3] hash of the literal contents The [Snix Castore Data Model][castore-data-model] simply uses the
when referring to a file/blob, which makes it very easy to ask other data [blake3][blake3] hash of the literal contents when referring to a file/blob,
sources for the same data, as no git-specific payload is included in the hash. which makes it very easy to ask other data sources for the same data, as no
git-specific payload is included in the hash.
This also plays very well together with things like [iroh][iroh-discussion], This also plays very well together with things like [iroh][iroh-discussion],
which plans to provide a way to substitute (large)blobs by their blake3 hash which plans to provide a way to substitute (large)blobs by their blake3 hash
over the IPFS network. over the IPFS network.
@ -52,6 +60,7 @@ and it's unclear what a "blake3" version of this would even look like.
[bao]: https://github.com/oconnor663/bao [bao]: https://github.com/oconnor663/bao
[blake3]: https://github.com/BLAKE3-team/BLAKE3 [blake3]: https://github.com/BLAKE3-team/BLAKE3
[castore-data-model]: {{< relref "data-model.md" >}}
[git-sha256]: https://git-scm.com/docs/hash-function-transition/ [git-sha256]: https://git-scm.com/docs/hash-function-transition/
[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197 [iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197
[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect. [^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.