fix(tvix): Avoid buffering file into memory in builtins.hashFile

Right now `builtins.hashFile` always reads the entire file into memory before hashing, which is not ideal for large files. This replaces `read_to_string` with `open_file` which allows calculating the hash of the file without buffering it entirely into memory. Other callers can continue to buffer into memory if they choose, but they still use the `open_file` VM request and then call `read_to_string` or `read_to_end` on the `std::io::Reader`. Fixes b/380 Change-Id: Ifa1c8324bcee8f751604b0b449feab875c632fda Reviewed-on: https://cl.tvl.fyi/c/depot/+/11236 Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-03-22 18:52:21 -05:00 · 2024-03-22 18:52:21 -05:00 · 63116d8c21
commit 63116d8c21
parent 17849c5c00
9 changed files with 80 additions and 74 deletions
--- a/tvix/glue/src/builtins/import.rs
+++ b/tvix/glue/src/builtins/import.rs
@ -177,9 +177,9 @@ mod import_builtins {
            })
            .transpose()?;

-        // FUTUREWORK(performance): this reads the file instead of using a stat-like
-        // system call to the file, this degrades very badly on large files.
-        if !recursive_ingestion && state.read_to_end(path.as_ref()).is_err() {
+        // FUTUREWORK(performance): this opens the file instead of using a stat-like
+        // system call to the file.
+        if !recursive_ingestion && state.open(path.as_ref()).is_err() {
            Err(ImportError::FlatImportOfNonFile(
                path.to_string_lossy().to_string(),
            ))?;