Commit graph

30 commits

Author SHA1 Message Date
Florian Klink
e9db0449e7 refactor(tvix/castore/import): make module, split off fs and error
Move error types and filesystem-specific functions to a separate file,
and keep the fs:: namespace in public exports.

Change-Id: I5e9e83ad78d9aea38553fafc293d3e4f8c31a8c1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11486
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
2024-04-20 14:14:19 +00:00
Florian Klink
c4cb099823 refactor(tvix/castore/import): rename ingest_entries arg
This is not a stream of direntries anymore, but a stream of ingestion
entries.

Change-Id: I387f4497b6567066b24c58ca0262e710348180e9
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11485
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2024-04-20 14:14:19 +00:00
Connor Brewster
259d7a3cfa refactor(tvix/castore): generalize store ingestion streams
Previously the store ingestion code was coupled to `walkdir::DirEntry`s
produced by the `walkdir` crate which made it impossible to reuse
ingesting from other sources like tarballs or NARs.

This introduces a `IngestionEntry` which carries enough information for
store ingestion and a future for computing the Blake3 digest of files.
This allows the producer to perform file uploads in a way that makes
sense for the source, ie. the filesystem upload could concurrently
upload multiple files at the same time, while the NAR ingestor will need
to ingest the entire blob before yielding the next blob in the stream.
In the future we can buffer small blobs and upload them concurrently,
but the full blob still needs to be read from the NAR before advancing.

Change-Id: I6d144063e2ba5b05e765bac1f27d41b3c8e7b283
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11462
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2024-04-19 20:37:05 +00:00
Florian Klink
f1349caf3f refactor(tvix/castore): relax trait bounds on BlobService
We don't need to clone BlobService anymore.

Change-Id: I2f3b9a595f604ec0f1e081f6e90cd8b67cbb8961
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11419
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
2024-04-15 14:47:12 +00:00
Florian Klink
6ebaa7b88a refactor(tvix/castore/import): restructure directory uploader a bit
Have a Option<Box<dyn DirectoryPutter>>, which is lazily initialized
whenever we first want to upload a directory.

Have the loop explicitly break when it encounters the root_node, and
deal with the flushing after the loop.

Deal with the FUTUREWORK (assertion for root directory digest matching
what the DirectoryPutter returns).

Change-Id: Iefc4904d8b8387e868fb752d40e3e4e4218c7407
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11417
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-15 14:47:12 +00:00
Florian Klink
c088123d4e refactor(tvix/castore/import): put invariant checker into a .inspect()
Separate this a bit stronger from the main application flow.

Change-Id: I2e9bd3ec47cc6e37256ba6afc6e0586ddc9a051f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11416
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Brian Olsen <me@griff.name>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-15 14:37:35 +00:00
Florian Klink
b70744fda6 refactor(tvix/*/import): rename direntry_stream, entries_per_depths
Align these names and comments with the two users, to make it more
obvious we're doing the same thing here, just use a different method to
come up with entries_per_depths.

Change-Id: I42058e397588b6b57a6299e87183bef27588b228
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11415
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-15 14:06:50 +00:00
Florian Klink
bcc00fba8f refactor(tvix/castore/import): inline process_entry
This did very little, and especially the part of relying on the outside
caller to pass in a Directory if the type is a directory required having
per-entry-type specific logic anyways.

It's cleaner to just inline it.

Change-Id: I997a8513ee91c67b0a2443cb5cd9e8700f69211e
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11414
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-15 14:05:49 +00:00
Florian Klink
d47bd4f4bc refactor(tvix/castore/import): move process_entry to the end of the file
This makes it easier to understand the code.

Change-Id: I0a9047433000551a6ba1f50a8c5c93527bc86216
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11413
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-15 14:05:18 +00:00
Florian Klink
b70e01a4db feat(tvix/castore/import): remove copying in find_ancestor
We don't need to copy if we explicitly say that the returned
Option<Path> may hold onto bytes from the passed in &DirEntry.

Change-Id: Ib46b6fd2f8f19a45f8bef79c4c1d2fa6b490cad7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11410
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2024-04-13 13:20:29 +00:00
Florian Klink
cc42cac39c refactor(tvix/castore/import): rename ingest_entries function arg
This is a stream of DirEntry, so let's call it direntry_stream.

Change-Id: I5b3cb4efba899d746393f75f6ece7eaa79424717
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11401
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2024-04-13 10:51:58 +00:00
Florian Klink
28f5c13c53 fix(tvix/castore): don't emit ret as INFO
This otherwise gets a bit spammy.

Change-Id: I288350a600d79a394c239f253424ad55bc3cefc5
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10954
Tested-by: BuildkiteCI
Reviewed-by: tazjin <tazjin@tvl.su>
2024-02-18 07:12:27 +00:00
Ryan Lahfa
68bba48d59 feat(tvix/castore): process_entry cannot process unsupported nodes
In the past, we had a `todo!` on unsupported node types, this returns a proper error
that can be caught by the caller.

Change-Id: Icba4c1dab33c0d670a97f162c9b358d1ed5855cb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10675
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2024-01-22 14:15:42 +00:00
Ryan Lahfa
7275288f0e refactor(tvix/castore): break down ingest_path
In one function that does the heavy lifting: `ingest_entries`, and three additional helpers:

- `walk_path_for_ingestion` which perform the tree walking in a very naive way and can be replaced by the user
- `leveled_entries_to_stream` which transforms a list of a list of
  entries ordered by their depth in the tree to a stream of entries in
  the bottom to top order (Merkle-compatible order I will say in the
  future).
- `ingest_path` which calls the previous functions.

Change-Id: I724b972d3c5bffc033f03363255eae448f017cef
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10573
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: raitobezarius <tvl@lahfa.xyz>
2024-01-20 18:26:17 +00:00
Ryan Lahfa
1f1a42b4da feat(tvix/castore): ingestion does DFS and invert it
To make use of the filtering feature, we need to revert the internal walker to a real DFS.

We will therefore just invert the whole tree by storing all of its
contents in a level-keyed vector.

This is horribly expensive in memory, this is a compromise between CPU
and memory, here is the fundamental reason for why:

When you encounter a directory, it's either a leaf or not, i.e. it
contains subdirectories or not.

To know this fact, you can:

- wait until you notice subdirectories under it, i.e. you need to store
  any intermediate nodes you see in the meantime -> memory penalty.
- getdents or readdir on it to determine *NOW* its subdirectories -> CPU
  penalty and I/O penalty.

This is an implementation of the first proposal, we pay memory.

In practice, we are paying O(#nb of nodes) in memory.

There's a smarter albeit much more complicated algorithm that pays only
O(\sum_i #siblings(p_i)) nodes where (p_1, ..., p_n) is the path to a leaf.

which means for:

             A
            / \
           B   C
          /   / \
         D   E   F

We would never store D, E, F but only E, F at a given time.
But we would still store B, C no matter what.

Change-Id: I456ed1c3f0db493e018ba1182665d84bebe29c11
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10567
Tested-by: BuildkiteCI
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Reviewed-by: flokli <flokli@flokli.de>
2024-01-20 17:16:01 +00:00
Ryan Lahfa
93afc711f6 feat(tvix/castore): convert import error to std::io::Error
So that we can just `map_err` easily in functions returning `std::io::Error` but calling functions
returning `castore::import::Error`.

Change-Id: Id181b95e8431c69e95f3a8cd569ca10306656e1d
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10572
Autosubmit: raitobezarius <tvl@lahfa.xyz>
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2024-01-18 14:40:06 +00:00
Florian Klink
89882ff9b1 refactor(tvix): use AsRef<dyn …> instead of Deref<Target= …>
Removes some more needs for Arcs.

Change-Id: I9a9f4b81641c271de260e9ffa98313a32944d760
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10578
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2024-01-09 14:08:22 +00:00
Ryan Lahfa
cbcd078684 chore(tvix/castore): fix the docstring for process_entry
It was a `//` not a `///`.

Change-Id: Iee3e8c116d73b5dd8a41c027153714415a66695f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10566
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2024-01-06 01:39:41 +00:00
Florian Klink
f20969de9b refactor(tvix/castore): relax trait bounds for DS
Make this an `AsRef<dyn DirectoryService>`.

This helps dropping some Clone requirements.

Unfortunately, we can't thread this through to TvixStoreIO just yet.

Change-Id: I3f07eb28d6c793d3313fe21506ada84d5a8aa3ac
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10533
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2024-01-05 16:43:34 +00:00
Florian Klink
ddae4860c2 feat(tvix/castore/import): generalize ingest_path
We don't actually care if it's an Arc<dyn BlobService>, or something
else, as long as we can Deref to a BlobService and clone.

Change-Id: I0852aaf723f51c5e6b820be8db1199d17309ab08
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10510
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2024-01-01 00:54:14 +00:00
Florian Klink
81ef26ba3f fix(tvix/castore/import): don't unwrap entry
If the path specified doesn't exist, construct a proper error instead
of panicking.

Part of b/344.

Change-Id: Id5c6a91248b0a387f3e8f138f8e686e402009e8f
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10330
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2023-12-12 18:07:11 +00:00
Florian Klink
afd09c3290 feat(tvix/castore/import): log returned errors
This will emit a log event / trace in case this function returns an
error-y type.

Change-Id: I48db6807f3e42304357c422a2b6e177cb8b95228
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10329
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
2023-12-12 18:07:11 +00:00
Florian Klink
30d82efa77 refactor(tvix/castore/blobservice): use io::Result in trait
For all these calls, the caller has enough context about what it did, so
it should be fine to use io::Result here.

We pretty much only constructed crate::Error::StorageError before
anyways, so this conveys *more* information.

Change-Id: I5cabb3769c9c2314bab926d34dda748fda9d3ccc
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10328
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
2023-12-12 18:06:40 +00:00
sterni
875bb26fc3 fix(tvix/castore): correctly flag unreachable code
Change-Id: Id09afa4b77c3c70fb5695f253f6df4aa88b61e19
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10113
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2023-11-24 23:36:15 +00:00
Florian Klink
2546446d51 feat(tvix/castore): bump [Directory,File]Node size to u64
Having more than 4GiB files is quite possible (think about the NixOS
graphical installer, and an uncompressed iso of it).

No wire format changes.

Change-Id: Ia78a07e4c554e91b93c5b9f8533266e4bd7f22b6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9950
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
2023-11-05 10:57:01 +00:00
Connor Brewster
0325ae3ba3 fix(tvix/castore): Fix race when ingesting into castore
After finishing the ingestion, the directory putter was not being
closed. This caused a race where the root directory node was accessed
before the directory node had been flushed to the server.

This patch makes it so we close the putter before returning the root
node which should ensure that the root node exists on the directory
service server before the `ingest_path` function returns.

Fixes b/326

Change-Id: Id16cf46bc48962121dde76d3c9c23a845d87d0f1
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9761
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-10-17 13:01:29 +00:00
Linus Heckemann
b1ab8075cd docs(tvix/castore): point out use of contents_first
Change-Id: I7620d2abe01675ea7028a478d4f8447e36d5768b
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9605
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
2023-10-13 17:14:48 +00:00
Florian Klink
1629f3064f docs(tvix/castore): remove TODO
This probably was about passing around directory_putter at some point,
which we do, so whatever this meant, it's not actionable anymore.

Change-Id: I1b4e0cdd2119bf2b2a9cf06d186a3b476b0ff367
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9573
Reviewed-by: Linus Heckemann <git@sphalerite.org>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
2023-10-08 16:00:44 +00:00
edef
9a7c078a69 fix(tvix/castore): explicitly name lifetimes in process_entry
Otherwise this produces absolutely inscrutable errors:

    note: hidden type `[async fn body@castore/src/import.rs:63:1: 63:94]` captures lifetime '_#24r

Change-Id: If5d9626c9edf400de5bcec038bcaa5a3117561f0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9506
Tested-by: BuildkiteCI
Autosubmit: edef <edef@edef.eu>
Reviewed-by: flokli <flokli@flokli.de>
2023-10-04 08:31:40 +00:00
Florian Klink
32f41458c0 refactor(tvix): move castore into tvix-castore crate
This splits the pure content-addressed layers from tvix-store into a
`castore` crate, and only leaves PathInfo related things, as well as the
CLI entrypoint in the tvix-store crate.

Notable changes:
 - `fixtures` and `utils` had to be moved out of the `test` cfg, so they
   can be imported from tvix-store.
 - Some ad-hoc fixtures in the test were moved to proper fixtures in the
   same step.
 - The protos are now created by a (more static) recipe in the protos/
   directory.

The (now two) golang targets are commented out, as it's not possible to
update them properly in the same CL. This will be done by a followup CL
once this is merged (and whitby deployed)

Bug: https://b.tvl.fyi/issues/301

Change-Id: I8d675d4bf1fb697eb7d479747c1b1e3635718107
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9370
Reviewed-by: tazjin <tazjin@tvl.su>
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
Reviewed-by: Connor Brewster <cbrewster@hey.com>
2023-09-22 12:51:21 +00:00
Renamed from tvix/store/src/import.rs (Browse further)