docs(web/snix/castore): port castore data model
Also restructure this, explaining the Rust data types, and then explaining the differences with the proto implementation, which uses "entry" since cl/30296. Change-Id: Ie264ab60998f0d891b4a4ea680a2d9dbe1c5929e Reviewed-on: https://cl.snix.dev/c/snix/+/30314 Autosubmit: Florian Klink <flokli@flokli.de> Reviewed-by: Domen Kožar <domen@cachix.org> Tested-by: besadii
This commit is contained in:
parent
d1990c9a93
commit
b2d2d622e0
4 changed files with 102 additions and 51 deletions
|
|
@ -21,7 +21,6 @@
|
||||||
- [Store API](./store/api.md)
|
- [Store API](./store/api.md)
|
||||||
- [BlobStore Chunking](./castore/blobstore-chunking.md)
|
- [BlobStore Chunking](./castore/blobstore-chunking.md)
|
||||||
- [BlobStore Protocol](./castore/blobstore-protocol.md)
|
- [BlobStore Protocol](./castore/blobstore-protocol.md)
|
||||||
- [Data Model](./castore/data-model.md)
|
|
||||||
- [Why not git trees?](./castore/why-not-git-trees.md)
|
- [Why not git trees?](./castore/why-not-git-trees.md)
|
||||||
|
|
||||||
# Nix
|
# Nix
|
||||||
|
|
|
||||||
|
|
@ -1,50 +0,0 @@
|
||||||
# Data model
|
|
||||||
|
|
||||||
This provides some more notes on the fields used in castore.proto.
|
|
||||||
|
|
||||||
See [Store API](../store/api.md) for the full context.
|
|
||||||
|
|
||||||
## Directory message
|
|
||||||
`Directory` messages use the blake3 hash of their canonical protobuf
|
|
||||||
serialization as its identifier.
|
|
||||||
|
|
||||||
A `Directory` message contains three lists, `directories`, `files` and
|
|
||||||
`symlinks`, holding `DirectoryNode`, `FileNode` and `SymlinkNode` messages
|
|
||||||
respectively. They describe all the direct child elements that are contained in
|
|
||||||
a directory.
|
|
||||||
|
|
||||||
All three message types have a `name` field, specifying the (base)name of the
|
|
||||||
element (which MUST not contain slashes or null bytes, and MUST not be '.' or '..').
|
|
||||||
For reproducibility reasons, the lists MUST be sorted by that name and the
|
|
||||||
name MUST be unique across all three lists.
|
|
||||||
|
|
||||||
In addition to the `name` field, the various *Node messages have the following
|
|
||||||
fields:
|
|
||||||
|
|
||||||
## DirectoryNode
|
|
||||||
A `DirectoryNode` message represents a child directory.
|
|
||||||
|
|
||||||
It has a `digest` field, which points to the identifier of another `Directory`
|
|
||||||
message, making a `Directory` a merkle tree (or strictly speaking, a graph, as
|
|
||||||
two elements pointing to a child directory with the same contents would point
|
|
||||||
to the same `Directory` message).
|
|
||||||
|
|
||||||
There's also a `size` field, containing the (total) number of all child
|
|
||||||
elements in the referenced `Directory`, which helps for inode calculation.
|
|
||||||
|
|
||||||
## FileNode
|
|
||||||
A `FileNode` message represents a child (regular) file.
|
|
||||||
|
|
||||||
Its `digest` field contains the blake3 hash of the file contents. It can be
|
|
||||||
looked up in the `BlobService`.
|
|
||||||
|
|
||||||
The `size` field contains the size of the blob the `digest` field refers to.
|
|
||||||
|
|
||||||
The `executable` field specifies whether the file should be marked as
|
|
||||||
executable or not.
|
|
||||||
|
|
||||||
## SymlinkNode
|
|
||||||
A `SymlinkNode` message represents a child symlink.
|
|
||||||
|
|
||||||
In addition to the `name` field, the only additional field is the `target`,
|
|
||||||
which is a string containing the target of the symlink.
|
|
||||||
10
web/content/docs/components/castore/_index.md
Normal file
10
web/content/docs/components/castore/_index.md
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
---
|
||||||
|
title: "Castore"
|
||||||
|
description: ""
|
||||||
|
summary: ""
|
||||||
|
date: 2025-04-04T16:43:14+01:00
|
||||||
|
lastmod: 2025-04-04T16:43:14+01:00
|
||||||
|
draft: false
|
||||||
|
weight: 42
|
||||||
|
---
|
||||||
|
|
||||||
92
web/content/docs/components/castore/data-model.md
Normal file
92
web/content/docs/components/castore/data-model.md
Normal file
|
|
@ -0,0 +1,92 @@
|
||||||
|
---
|
||||||
|
title: "Data Model"
|
||||||
|
summary: ""
|
||||||
|
date: 2025-04-04T16:16:37+00:00
|
||||||
|
lastmod: 2025-04-04T16:16:37+00:00
|
||||||
|
draft: false
|
||||||
|
weight: 41
|
||||||
|
toc: true
|
||||||
|
---
|
||||||
|
|
||||||
|
This describes the data model used in `snix-castore` to describe file system
|
||||||
|
trees. blob / chunk storage is covered by other documents.
|
||||||
|
|
||||||
|
For those familiar, `snix-castore` uses a similar concept as git tree objects,
|
||||||
|
which also is a merkle structure. [^why-not-git-trees]
|
||||||
|
|
||||||
|
## [Node][rustdoc-node]
|
||||||
|
`snix-castore` can represent three different types.
|
||||||
|
Nodes themselves don't have names, names are given by being in a
|
||||||
|
[Directory](#directory) structure.
|
||||||
|
|
||||||
|
### `Node::File`
|
||||||
|
A (regular) file.
|
||||||
|
We store the [BLAKE3] digest of the raw file contents, the length of the raw
|
||||||
|
data, and an executable bit.
|
||||||
|
|
||||||
|
### `Node::Symlink`
|
||||||
|
A symbolic link.
|
||||||
|
We store the symlink target contents.
|
||||||
|
|
||||||
|
### `Node::Directory`
|
||||||
|
A (child) directory.
|
||||||
|
We store the digest of the [Directory](#directory) structure describing its
|
||||||
|
"contents".
|
||||||
|
|
||||||
|
We also store a `size` field, containing the (total) number of all child
|
||||||
|
elements in the referenced `Directory`, which helps for inode calculation.
|
||||||
|
|
||||||
|
|
||||||
|
## [Directory][rustdoc-node]
|
||||||
|
The Directory struct contains all nodes in a single directory (on that level),
|
||||||
|
alongside with their (base)names (called [PathComponent]).
|
||||||
|
|
||||||
|
`.` and `..` are not included.
|
||||||
|
|
||||||
|
For the Directory struct, a *Digest* can be calculated[^directory-digest], which
|
||||||
|
is what the parent `Node::Directory` will use as a reference, to build a merkle
|
||||||
|
structure.
|
||||||
|
|
||||||
|
## [PathComponent][rustdoc-pathcomponent]
|
||||||
|
This is a more strict version of bytes, reduced to valid path components in a
|
||||||
|
[Directory](#directory).
|
||||||
|
|
||||||
|
It disallows slashes, null bytes, `.`, `..` and the
|
||||||
|
empty string. It also rejects too long names (> 255 bytes).
|
||||||
|
|
||||||
|
## Merkle DAG
|
||||||
|
The pointers from `Node::File` to `Directory`, and this one potentially
|
||||||
|
containing `Node::File` again makes the whole structure a merkle tree (or
|
||||||
|
strictly speaking, a graph, as two elements pointing to a child directory with
|
||||||
|
the same contents would point to the same `Directory` message).
|
||||||
|
|
||||||
|
|
||||||
|
## Protobuf
|
||||||
|
In addition to the Rust types described above, there's also a protobuf
|
||||||
|
representation, which differs slightly:
|
||||||
|
|
||||||
|
Instead of nodes being unnamed, and `Directory` containing a map from
|
||||||
|
`PathComponent` to `Node` (and keys being the basenames in that directory),
|
||||||
|
the `Directory` message contains three lists, `directories`, `files` and
|
||||||
|
`symlinks`, holding `DirectoryEntry`, `FileEntry` and `SymlinkEntry` messages
|
||||||
|
respectively.
|
||||||
|
|
||||||
|
These contain all fields present in the corresponding `Node` enum kind, as well
|
||||||
|
as a `name` field, representing the basename in that directory.
|
||||||
|
|
||||||
|
For reproducibility reasons, the lists MUST be sorted by that name and the
|
||||||
|
name MUST be unique across all three lists.
|
||||||
|
|
||||||
|
|
||||||
|
[rustdoc-directory]: https://snix.dev/rustdoc/snix_castore/struct.Directory.html
|
||||||
|
[rustdoc-node]: https://snix.dev/rustdoc/snix_castore/enum.Node.html
|
||||||
|
[rustdoc-pathcomponent]: https://snix.dev/rustdoc/snix_castore/struct.PathComponent.html
|
||||||
|
[BLAKE3]: https://github.com/BLAKE3-team/BLAKE3
|
||||||
|
[^why-not-git-trees]: For a detailed comparison with the git model, and what (and why we do differently, see TODO LINK)
|
||||||
|
[^directory-digest]: We currently use the [BLAKE3][] digest of the protobuf
|
||||||
|
serialization of the `proto::Directory` struct to calculate
|
||||||
|
these digests. While pretty stable across most
|
||||||
|
implementations, there's no guarantee this will always stay
|
||||||
|
as-is, so we might switch to another serialization with
|
||||||
|
stronger guarantees on that front in the future.
|
||||||
|
See [#111](https://git.snix.dev/snix/snix/issues/111) for details.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue