diff --git a/snix/docs/src/SUMMARY.md b/snix/docs/src/SUMMARY.md index a3e120b75..d7b08121a 100644 --- a/snix/docs/src/SUMMARY.md +++ b/snix/docs/src/SUMMARY.md @@ -21,7 +21,6 @@ - [Store API](./store/api.md) - [BlobStore Chunking](./castore/blobstore-chunking.md) - [BlobStore Protocol](./castore/blobstore-protocol.md) -- [Why not git trees?](./castore/why-not-git-trees.md) # Nix - [Specification of the Nix Language](./language-spec.md) diff --git a/web/content/docs/components/castore/data-model.md b/web/content/docs/components/castore/data-model.md index 0d64fc3f7..bdb45c172 100644 --- a/web/content/docs/components/castore/data-model.md +++ b/web/content/docs/components/castore/data-model.md @@ -82,7 +82,7 @@ name MUST be unique across all three lists. [rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html [rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html [BLAKE3]: https://github.com/BLAKE3-team/BLAKE3 -[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see TODO LINK) +[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see [here]({{< relref "why-not-git.md" >}})) [^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf serialization of the `proto::Directory` struct to calculate these digests. While pretty stable across most diff --git a/snix/docs/src/castore/why-not-git-trees.md b/web/content/docs/components/castore/why-not-git.md similarity index 70% rename from snix/docs/src/castore/why-not-git-trees.md rename to web/content/docs/components/castore/why-not-git.md index bcd2acd8e..39ffa54cf 100644 --- a/snix/docs/src/castore/why-not-git-trees.md +++ b/web/content/docs/components/castore/why-not-git.md @@ -1,16 +1,23 @@ -## Why not git tree objects? +--- +title: "Why not Git?" +summary: "" +date: 2025-04-04T16:16:37+00:00 +lastmod: 2025-04-04T16:16:37+00:00 +draft: false +weight: 42 +toc: false +--- We've been experimenting with (some variations of) the git tree and object format, and ultimately decided against using it as an internal format, and -instead adapted the one documented in the other documents here. +instead adapted our own [Data Model][castore-data-model]. -While the tvix-store API protocol shares some similarities with the format used -in git for trees and objects, the git one has shown some significant -disadvantages: +While castore shares some similarities with the format used in git for trees and +objects, the git one has shown some significant disadvantages: ### The binary encoding itself -#### trees +#### git trees The git tree object format is a very binary, error-prone and "made-to-be-read-and-written-from-C" format. @@ -20,22 +27,23 @@ tree object in this encoding. Extensions of the format/changes are very hard to do right, because parsers are not aware they might be parsing something different. -The tvix-store protocol uses a canonical protobuf serialization, and uses -the [blake3][blake3] hash of that serialization to point to other `Directory` -messages. +The [Snix Castore Data Model][castore-data-model] uses a canonical protobuf +serialization, and uses the [blake3][blake3] hash of that serialization to point +to other `Directory` messages. It's both compact and with a wide range of libraries for encoders and decoders in many programming languages. The choice of protobuf makes it easy to add new fields, and make old clients aware of some unknown fields being detected [^adding-fields]. -#### blob +#### git blob On disk, git blob objects start with a "blob" prefix, then the size of the payload, and then the data itself. The hash of a blob is the literal sha1sum over all of this - which makes it something very git specific to request for. -tvix-store simply uses the [blake3][blake3] hash of the literal contents -when referring to a file/blob, which makes it very easy to ask other data -sources for the same data, as no git-specific payload is included in the hash. +The [Snix Castore Data Model][castore-data-model] simply uses the +[blake3][blake3] hash of the literal contents when referring to a file/blob, +which makes it very easy to ask other data sources for the same data, as no +git-specific payload is included in the hash. This also plays very well together with things like [iroh][iroh-discussion], which plans to provide a way to substitute (large)blobs by their blake3 hash over the IPFS network. @@ -52,6 +60,7 @@ and it's unclear what a "blake3" version of this would even look like. [bao]: https://github.com/oconnor663/bao [blake3]: https://github.com/BLAKE3-team/BLAKE3 +[castore-data-model]: {{< relref "data-model.md" >}} [git-sha256]: https://git-scm.com/docs/hash-function-transition/ [iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197 [^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.