chore(tvix/castore): move data model docs to here

These describe the castore data model, so it should live in the castore
crate.
Also, some minor edits to //tvix/store/docs/api.md, to honor the move of
the castore bits to tvix-castore.

Change-Id: I1836556b652ac0592336eac95a8d0647599f4aec
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9893
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This commit is contained in:
Florian Klink 2023-10-19 15:01:53 +01:00 committed by clbot
parent d545f11819
commit beae3a4bf1
3 changed files with 20 additions and 15 deletions

View file

@ -1,10 +1,12 @@
tvix-store API
tvix-[ca]store API
==============
This document outlines the design of the API exposed by tvix-store, as
well as other implementations of this store protocol.
This document outlines the design of the API exposed by tvix-castore and tvix-
store, as well as other implementations of this store protocol.
This document is meant to be read side-by-side with [castore.md](./castore.md) which describes the data model in more detail.
This document is meant to be read side-by-side with
[castore.md](../../tvix-castore/docs/castore.md) which describes the data model
in more detail.
The store API has four main consumers:
@ -115,8 +117,9 @@ content-addressed world to a physical path.
### PathInfo
As most paths in the Nix store currently are input-addressed [^input-addressed],
we need something mapping from an input-addressed "output path hash" to the
contents in the content- addressed world.
and the `tvix-castore` data model is also not intrinsically using NAR hashes,
we need something mapping from an input-addressed "output path hash" (or a Nix-
specific content-addressed path) to the contents in the `tvix-castore` world.
That's what `PathInfo` provides. It embeds the root node (Directory, File or
Symlink) at a given store path.
@ -215,13 +218,15 @@ This is useful for people running a Tvix-only system, or running builds on a
In a system with Nix installed, we can't simply manually "extract" things to
`/nix/store`, as Nix assumes to own all writes to this location.
In these use cases, we're probably better off exposing a tvix-store as a local
binary cache (that's what nar-bridge does).
binary cache (that's what `//tvix/nar-bridge` does).
Assuming we are in an environment where we control `/nix/store` exclusively, a
"realize to disk" would either "extract" things from the tvix-store to a
filesystem, or expose a FUSE filesystem. The latter would be particularly
interesting for remote build workloads, as build inputs can be realized on-
demand, which saves copying around a lot of never-accessed files.
"realize to disk" would either "extract" things from the `tvix-store` to a
filesystem, or expose a `FUSE`/`virtio-fs` filesystem.
The latter is already implemented, and particularly interesting for (remote)
build workloads, as build inputs can be realized on-demand, which saves copying
around a lot of never- accessed files.
In both cases, the API interactions are similar.
* The *PathInfoService* is asked for the `PathInfo` of the requested store path.
@ -253,7 +258,7 @@ As already described above, the only non-content-addressed service is the
This means, all other messages (such as `Blob` and `Directory` messages) can be
substituted from many different, untrusted sources/mirrors, which will make
plugging in additional substitution strategies like IPFS, local network
neighbors super simple.
neighbors super simple. That's also why it's living in the `tvix-castore` crate.
As for `PathInfo`, we don't specify an additional signature mechanism yet, but
carry the NAR-based signatures from Nix along.
@ -268,7 +273,7 @@ rather than a whole NAR file.
A future signature mechanism, that is only signing (parts of) the `PathInfo`
message, which only points to content-addressed data will enable verified
partial access into a store path, opening up opportunities for lazy filesystem
access, which is very useful in remote builder scenarios.
access etc.

View file

@ -1,50 +0,0 @@
# //tvix/store/docs/castore.md
This provides some more notes on the fields used in castore.proto.
It's meant to supplement `//tvix/store/docs/api.md`.
## Directory message
`Directory` messages use the blake3 hash of their canonical protobuf
serialization as its identifier.
A `Directory` message contains three lists, `directories`, `files` and
`symlinks`, holding `DirectoryNode`, `FileNode` and `SymlinkNode` messages
respectively. They describe all the direct child elements that are contained in
a directory.
All three message types have a `name` field, specifying the (base)name of the
element (which MUST not contain slashes or null bytes, and MUST not be '.' or '..').
For reproducibility reasons, the lists MUST be sorted by that name and also
MUST be unique across all three lists.
In addition to the `name` field, the various *Node messages have the following
fields:
## DirectoryNode
A `DirectoryNode` message represents a child directory.
It has a `digest` field, which points to the identifier of another `Directory`
message, making a `Directory` a merkle tree (or strictly speaking, a graph, as
two elements pointing to a child directory with the same contents would point
to the same `Directory` message.
There's also a `size` field, containing the (total) number of all child
elements in the referenced `Directory`, which helps for inode calculation.
## FileNode
A `FileNode` message represents a child (regular) file.
Its `digest` field contains the blake3 hash of the file contents. It can be
looked up in the `BlobService`.
The `size` field contains the size of the blob the `digest` field refers to.
The `executable` field specifies whether the file should be marked as
executable or not.
## SymlinkNode
A `SymlinkNode` message represents a child symlink.
In addition to the `name` field, the only additional field is the `target`,
which is a string containing the target of the symlink.

View file

@ -1,57 +0,0 @@
## Why not git tree objects?
We've been experimenting with (some variations of) the git tree and object
format, and ultimately decided against using it as an internal format, and
instead adapted the one documented in the other documents here.
While the tvix-store API protocol shares some similarities with the format used
in git for trees and objects, the git one has shown some significant
disadvantages:
### The binary encoding itself
#### trees
The git tree object format is a very binary, error-prone and
"made-to-be-read-and-written-from-C" format.
Tree objects are a combination of null-terminated strings, and fields of known
length. References to other tree objects use the literal sha1 hash of another
tree object in this encoding.
Extensions of the format/changes are very hard to do right, because parsers are
not aware they might be parsing something different.
The tvix-store protocol uses a canonical protobuf serialization, and uses
the [blake3][blake3] hash of that serialization to point to other `Directory`
messages.
It's both compact and with a wide range of libraries for encoders and decoders
in many programming languages.
The choice of protobuf makes it easy to add new fields, and make old clients
aware of some unknown fields being detected [^adding-fields].
#### blob
On disk, git blob objects start with a "blob" prefix, then the size of the
payload, and then the data itself. The hash of a blob is the literal sha1sum
over all of this - which makes it something very git specific to request for.
tvix-store simply uses the [blake3][blake3] hash of the literal contents
when referring to a file/blob, which makes it very easy to ask other data
sources for the same data, as no git-specific payload is included in the hash.
This also plays very well together with things like [iroh][iroh-discussion],
which plans to provide a way to substitute (large)blobs by their blake3 hash
over the IPFS network.
In addition to that, [blake3][blake3] makes it possible to do
[verified streaming][bao], as already described in other parts of the
documentation.
The git tree object format uses sha1 both for references to other trees and
hashes of blobs, which isn't really a hash function to fundamentally base
everything on in 2023.
The [migration to sha256][git-sha256] also has been dead for some years now,
and it's unclear how a "blake3" version of this would even look like.
[bao]: https://github.com/oconnor663/bao
[blake3]: https://github.com/BLAKE3-team/BLAKE3
[git-sha256]: https://git-scm.com/docs/hash-function-transition/
[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197
[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.