docs(web/snix/castore): port "why not git" document
Change-Id: I4ac77f264d2704018cd33dbf80746e82d193686e Reviewed-on: https://cl.snix.dev/c/snix/+/30315 Reviewed-by: Domen Kožar <domen@cachix.org> Tested-by: besadii Autosubmit: Florian Klink <flokli@flokli.de>
This commit is contained in:
parent
b2d2d622e0
commit
c9082a586c
3 changed files with 23 additions and 15 deletions
|
|
@ -21,7 +21,6 @@
|
||||||
- [Store API](./store/api.md)
|
- [Store API](./store/api.md)
|
||||||
- [BlobStore Chunking](./castore/blobstore-chunking.md)
|
- [BlobStore Chunking](./castore/blobstore-chunking.md)
|
||||||
- [BlobStore Protocol](./castore/blobstore-protocol.md)
|
- [BlobStore Protocol](./castore/blobstore-protocol.md)
|
||||||
- [Why not git trees?](./castore/why-not-git-trees.md)
|
|
||||||
|
|
||||||
# Nix
|
# Nix
|
||||||
- [Specification of the Nix Language](./language-spec.md)
|
- [Specification of the Nix Language](./language-spec.md)
|
||||||
|
|
|
||||||
|
|
@ -82,7 +82,7 @@ name MUST be unique across all three lists.
|
||||||
[rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html
|
[rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html
|
||||||
[rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html
|
[rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html
|
||||||
[BLAKE3]: https://github.com/BLAKE3-team/BLAKE3
|
[BLAKE3]: https://github.com/BLAKE3-team/BLAKE3
|
||||||
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see TODO LINK)
|
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see [here]({{< relref "why-not-git.md" >}}))
|
||||||
[^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf
|
[^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf
|
||||||
serialization of the `proto::Directory` struct to calculate
|
serialization of the `proto::Directory` struct to calculate
|
||||||
these digests. While pretty stable across most
|
these digests. While pretty stable across most
|
||||||
|
|
|
||||||
|
|
@ -1,16 +1,23 @@
|
||||||
## Why not git tree objects?
|
---
|
||||||
|
title: "Why not Git?"
|
||||||
|
summary: ""
|
||||||
|
date: 2025-04-04T16:16:37+00:00
|
||||||
|
lastmod: 2025-04-04T16:16:37+00:00
|
||||||
|
draft: false
|
||||||
|
weight: 42
|
||||||
|
toc: false
|
||||||
|
---
|
||||||
|
|
||||||
We've been experimenting with (some variations of) the git tree and object
|
We've been experimenting with (some variations of) the git tree and object
|
||||||
format, and ultimately decided against using it as an internal format, and
|
format, and ultimately decided against using it as an internal format, and
|
||||||
instead adapted the one documented in the other documents here.
|
instead adapted our own [Data Model][castore-data-model].
|
||||||
|
|
||||||
While the tvix-store API protocol shares some similarities with the format used
|
While castore shares some similarities with the format used in git for trees and
|
||||||
in git for trees and objects, the git one has shown some significant
|
objects, the git one has shown some significant disadvantages:
|
||||||
disadvantages:
|
|
||||||
|
|
||||||
### The binary encoding itself
|
### The binary encoding itself
|
||||||
|
|
||||||
#### trees
|
#### git trees
|
||||||
The git tree object format is a very binary, error-prone and
|
The git tree object format is a very binary, error-prone and
|
||||||
"made-to-be-read-and-written-from-C" format.
|
"made-to-be-read-and-written-from-C" format.
|
||||||
|
|
||||||
|
|
@ -20,22 +27,23 @@ tree object in this encoding.
|
||||||
Extensions of the format/changes are very hard to do right, because parsers are
|
Extensions of the format/changes are very hard to do right, because parsers are
|
||||||
not aware they might be parsing something different.
|
not aware they might be parsing something different.
|
||||||
|
|
||||||
The tvix-store protocol uses a canonical protobuf serialization, and uses
|
The [Snix Castore Data Model][castore-data-model] uses a canonical protobuf
|
||||||
the [blake3][blake3] hash of that serialization to point to other `Directory`
|
serialization, and uses the [blake3][blake3] hash of that serialization to point
|
||||||
messages.
|
to other `Directory` messages.
|
||||||
It's both compact and with a wide range of libraries for encoders and decoders
|
It's both compact and with a wide range of libraries for encoders and decoders
|
||||||
in many programming languages.
|
in many programming languages.
|
||||||
The choice of protobuf makes it easy to add new fields, and make old clients
|
The choice of protobuf makes it easy to add new fields, and make old clients
|
||||||
aware of some unknown fields being detected [^adding-fields].
|
aware of some unknown fields being detected [^adding-fields].
|
||||||
|
|
||||||
#### blob
|
#### git blob
|
||||||
On disk, git blob objects start with a "blob" prefix, then the size of the
|
On disk, git blob objects start with a "blob" prefix, then the size of the
|
||||||
payload, and then the data itself. The hash of a blob is the literal sha1sum
|
payload, and then the data itself. The hash of a blob is the literal sha1sum
|
||||||
over all of this - which makes it something very git specific to request for.
|
over all of this - which makes it something very git specific to request for.
|
||||||
|
|
||||||
tvix-store simply uses the [blake3][blake3] hash of the literal contents
|
The [Snix Castore Data Model][castore-data-model] simply uses the
|
||||||
when referring to a file/blob, which makes it very easy to ask other data
|
[blake3][blake3] hash of the literal contents when referring to a file/blob,
|
||||||
sources for the same data, as no git-specific payload is included in the hash.
|
which makes it very easy to ask other data sources for the same data, as no
|
||||||
|
git-specific payload is included in the hash.
|
||||||
This also plays very well together with things like [iroh][iroh-discussion],
|
This also plays very well together with things like [iroh][iroh-discussion],
|
||||||
which plans to provide a way to substitute (large)blobs by their blake3 hash
|
which plans to provide a way to substitute (large)blobs by their blake3 hash
|
||||||
over the IPFS network.
|
over the IPFS network.
|
||||||
|
|
@ -52,6 +60,7 @@ and it's unclear what a "blake3" version of this would even look like.
|
||||||
|
|
||||||
[bao]: https://github.com/oconnor663/bao
|
[bao]: https://github.com/oconnor663/bao
|
||||||
[blake3]: https://github.com/BLAKE3-team/BLAKE3
|
[blake3]: https://github.com/BLAKE3-team/BLAKE3
|
||||||
|
[castore-data-model]: {{< relref "data-model.md" >}}
|
||||||
[git-sha256]: https://git-scm.com/docs/hash-function-transition/
|
[git-sha256]: https://git-scm.com/docs/hash-function-transition/
|
||||||
[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197
|
[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197
|
||||||
[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.
|
[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue