fix(tvix): Avoid buffering file into memory in builtins.hashFile

Right now `builtins.hashFile` always reads the entire file into memory
before hashing, which is not ideal for large files. This replaces
`read_to_string` with `open_file` which allows calculating the hash of
the file without buffering it entirely into memory. Other callers can
continue to buffer into memory if they choose, but they still use the
`open_file` VM request and then call `read_to_string` or `read_to_end`
on the `std::io::Reader`.

Fixes b/380

Change-Id: Ifa1c8324bcee8f751604b0b449feab875c632fda
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11236
Reviewed-by: flokli <flokli@flokli.de>
Tested-by: BuildkiteCI
This commit is contained in:
Connor Brewster 2024-03-22 18:52:21 -05:00
parent 17849c5c00
commit 63116d8c21
9 changed files with 80 additions and 74 deletions

View file

@ -177,9 +177,9 @@ mod import_builtins {
})
.transpose()?;
// FUTUREWORK(performance): this reads the file instead of using a stat-like
// system call to the file, this degrades very badly on large files.
if !recursive_ingestion && state.read_to_end(path.as_ref()).is_err() {
// FUTUREWORK(performance): this opens the file instead of using a stat-like
// system call to the file.
if !recursive_ingestion && state.open(path.as_ref()).is_err() {
Err(ImportError::FlatImportOfNonFile(
path.to_string_lossy().to_string(),
))?;