docs(web/snix/castore): port "why not git" document
Change-Id: I4ac77f264d2704018cd33dbf80746e82d193686e Reviewed-on: https://cl.snix.dev/c/snix/+/30315 Reviewed-by: Domen Kožar <domen@cachix.org> Tested-by: besadii Autosubmit: Florian Klink <flokli@flokli.de>
This commit is contained in:
parent
b2d2d622e0
commit
c9082a586c
3 changed files with 23 additions and 15 deletions
|
|
@ -82,7 +82,7 @@ name MUST be unique across all three lists.
|
|||
[rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html
|
||||
[rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html
|
||||
[BLAKE3]: https://github.com/BLAKE3-team/BLAKE3
|
||||
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see TODO LINK)
|
||||
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see [here]({{< relref "why-not-git.md" >}}))
|
||||
[^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf
|
||||
serialization of the `proto::Directory` struct to calculate
|
||||
these digests. While pretty stable across most
|
||||
|
|
|
|||
66
web/content/docs/components/castore/why-not-git.md
Normal file
66
web/content/docs/components/castore/why-not-git.md
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
---
|
||||
title: "Why not Git?"
|
||||
summary: ""
|
||||
date: 2025-04-04T16:16:37+00:00
|
||||
lastmod: 2025-04-04T16:16:37+00:00
|
||||
draft: false
|
||||
weight: 42
|
||||
toc: false
|
||||
---
|
||||
|
||||
We've been experimenting with (some variations of) the git tree and object
|
||||
format, and ultimately decided against using it as an internal format, and
|
||||
instead adapted our own [Data Model][castore-data-model].
|
||||
|
||||
While castore shares some similarities with the format used in git for trees and
|
||||
objects, the git one has shown some significant disadvantages:
|
||||
|
||||
### The binary encoding itself
|
||||
|
||||
#### git trees
|
||||
The git tree object format is a very binary, error-prone and
|
||||
"made-to-be-read-and-written-from-C" format.
|
||||
|
||||
Tree objects are a combination of null-terminated strings, and fields of known
|
||||
length. References to other tree objects use the literal sha1 hash of another
|
||||
tree object in this encoding.
|
||||
Extensions of the format/changes are very hard to do right, because parsers are
|
||||
not aware they might be parsing something different.
|
||||
|
||||
The [Snix Castore Data Model][castore-data-model] uses a canonical protobuf
|
||||
serialization, and uses the [blake3][blake3] hash of that serialization to point
|
||||
to other `Directory` messages.
|
||||
It's both compact and with a wide range of libraries for encoders and decoders
|
||||
in many programming languages.
|
||||
The choice of protobuf makes it easy to add new fields, and make old clients
|
||||
aware of some unknown fields being detected [^adding-fields].
|
||||
|
||||
#### git blob
|
||||
On disk, git blob objects start with a "blob" prefix, then the size of the
|
||||
payload, and then the data itself. The hash of a blob is the literal sha1sum
|
||||
over all of this - which makes it something very git specific to request for.
|
||||
|
||||
The [Snix Castore Data Model][castore-data-model] simply uses the
|
||||
[blake3][blake3] hash of the literal contents when referring to a file/blob,
|
||||
which makes it very easy to ask other data sources for the same data, as no
|
||||
git-specific payload is included in the hash.
|
||||
This also plays very well together with things like [iroh][iroh-discussion],
|
||||
which plans to provide a way to substitute (large)blobs by their blake3 hash
|
||||
over the IPFS network.
|
||||
|
||||
In addition to that, [blake3][blake3] makes it possible to do
|
||||
[verified streaming][bao], as already described in other parts of the
|
||||
documentation.
|
||||
|
||||
The git tree object format uses sha1 both for references to other trees and
|
||||
hashes of blobs, which isn't really a hash function to fundamentally base
|
||||
everything on in 2023.
|
||||
The [migration to sha256][git-sha256] also has been dead for some years now,
|
||||
and it's unclear what a "blake3" version of this would even look like.
|
||||
|
||||
[bao]: https://github.com/oconnor663/bao
|
||||
[blake3]: https://github.com/BLAKE3-team/BLAKE3
|
||||
[castore-data-model]: {{< relref "data-model.md" >}}
|
||||
[git-sha256]: https://git-scm.com/docs/hash-function-transition/
|
||||
[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197
|
||||
[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.
|
||||
Loading…
Add table
Add a link
Reference in a new issue