docs(web/snix/castore): port "why not git" document
Change-Id: I4ac77f264d2704018cd33dbf80746e82d193686e Reviewed-on: https://cl.snix.dev/c/snix/+/30315 Reviewed-by: Domen Kožar <domen@cachix.org> Tested-by: besadii Autosubmit: Florian Klink <flokli@flokli.de>
This commit is contained in:
parent
b2d2d622e0
commit
c9082a586c
3 changed files with 23 additions and 15 deletions
|
|
@ -21,7 +21,6 @@
|
|||
- [Store API](./store/api.md)
|
||||
- [BlobStore Chunking](./castore/blobstore-chunking.md)
|
||||
- [BlobStore Protocol](./castore/blobstore-protocol.md)
|
||||
- [Why not git trees?](./castore/why-not-git-trees.md)
|
||||
|
||||
# Nix
|
||||
- [Specification of the Nix Language](./language-spec.md)
|
||||
|
|
|
|||
|
|
@ -82,7 +82,7 @@ name MUST be unique across all three lists.
|
|||
[rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html
|
||||
[rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html
|
||||
[BLAKE3]: https://github.com/BLAKE3-team/BLAKE3
|
||||
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see TODO LINK)
|
||||
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see [here]({{< relref "why-not-git.md" >}}))
|
||||
[^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf
|
||||
serialization of the `proto::Directory` struct to calculate
|
||||
these digests. While pretty stable across most
|
||||
|
|
|
|||
|
|
@ -1,16 +1,23 @@
|
|||
## Why not git tree objects?
|
||||
---
|
||||
title: "Why not Git?"
|
||||
summary: ""
|
||||
date: 2025-04-04T16:16:37+00:00
|
||||
lastmod: 2025-04-04T16:16:37+00:00
|
||||
draft: false
|
||||
weight: 42
|
||||
toc: false
|
||||
---
|
||||
|
||||
We've been experimenting with (some variations of) the git tree and object
|
||||
format, and ultimately decided against using it as an internal format, and
|
||||
instead adapted the one documented in the other documents here.
|
||||
instead adapted our own [Data Model][castore-data-model].
|
||||
|
||||
While the tvix-store API protocol shares some similarities with the format used
|
||||
in git for trees and objects, the git one has shown some significant
|
||||
disadvantages:
|
||||
While castore shares some similarities with the format used in git for trees and
|
||||
objects, the git one has shown some significant disadvantages:
|
||||
|
||||
### The binary encoding itself
|
||||
|
||||
#### trees
|
||||
#### git trees
|
||||
The git tree object format is a very binary, error-prone and
|
||||
"made-to-be-read-and-written-from-C" format.
|
||||
|
||||
|
|
@ -20,22 +27,23 @@ tree object in this encoding.
|
|||
Extensions of the format/changes are very hard to do right, because parsers are
|
||||
not aware they might be parsing something different.
|
||||
|
||||
The tvix-store protocol uses a canonical protobuf serialization, and uses
|
||||
the [blake3][blake3] hash of that serialization to point to other `Directory`
|
||||
messages.
|
||||
The [Snix Castore Data Model][castore-data-model] uses a canonical protobuf
|
||||
serialization, and uses the [blake3][blake3] hash of that serialization to point
|
||||
to other `Directory` messages.
|
||||
It's both compact and with a wide range of libraries for encoders and decoders
|
||||
in many programming languages.
|
||||
The choice of protobuf makes it easy to add new fields, and make old clients
|
||||
aware of some unknown fields being detected [^adding-fields].
|
||||
|
||||
#### blob
|
||||
#### git blob
|
||||
On disk, git blob objects start with a "blob" prefix, then the size of the
|
||||
payload, and then the data itself. The hash of a blob is the literal sha1sum
|
||||
over all of this - which makes it something very git specific to request for.
|
||||
|
||||
tvix-store simply uses the [blake3][blake3] hash of the literal contents
|
||||
when referring to a file/blob, which makes it very easy to ask other data
|
||||
sources for the same data, as no git-specific payload is included in the hash.
|
||||
The [Snix Castore Data Model][castore-data-model] simply uses the
|
||||
[blake3][blake3] hash of the literal contents when referring to a file/blob,
|
||||
which makes it very easy to ask other data sources for the same data, as no
|
||||
git-specific payload is included in the hash.
|
||||
This also plays very well together with things like [iroh][iroh-discussion],
|
||||
which plans to provide a way to substitute (large)blobs by their blake3 hash
|
||||
over the IPFS network.
|
||||
|
|
@ -52,6 +60,7 @@ and it's unclear what a "blake3" version of this would even look like.
|
|||
|
||||
[bao]: https://github.com/oconnor663/bao
|
||||
[blake3]: https://github.com/BLAKE3-team/BLAKE3
|
||||
[castore-data-model]: {{< relref "data-model.md" >}}
|
||||
[git-sha256]: https://git-scm.com/docs/hash-function-transition/
|
||||
[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197
|
||||
[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.
|
||||
Loading…
Add table
Add a link
Reference in a new issue