chore(tvix/eval): move eval docs to tvix/docs

Change-Id: I75b33c43456389de6e521b4f0ad46d68bc9e98f6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11809
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
This commit is contained in:
Florian Klink 2024-06-13 22:17:18 +03:00 committed by clbot
parent 6947dc4349
commit 5077ca70de
12 changed files with 16 additions and 0 deletions

View file

@ -4,6 +4,19 @@
- [Architecture & data flow](./architecture.md)
- [TODOs](./TODO.md)
# Evaluator
- [Compilation of Bindings](./eval/bindings.md)
- [Builtins](./eval/builtins.md)
- [Build References](./eval/build-references.md)
- [Catchable Errors](./eval/catchable-errors.md)
- [Known Optimisation Potential](./eval/known-optimisation-potential.md)
- [Langugage Issues](./eval/language-issues.md)
- [Attrset Opcodes](./eval/opcodes-attrsets.md)
- [Recursive attribute sets](./eval/recursive-attrs.md)
- [VM Loop](./eval/vm-loop.md)
- [Abandoned](./eval/abandoned/index.md)
- [Thread-local VM](./eval/abandoned/thread-local-vm.md)
# Store
- [Store API](./store/api.md)
- [BlobStore Chunking](./castore/blobstore-chunking.md)

View file

@ -0,0 +1,3 @@
# Abandoned ideas
This chapter keeps track of abandoned ideas, and why they were abandoned.

View file

@ -0,0 +1,233 @@
# We can't have nice things because IFD
The thread-local VM work below was ultimately not merged because it
was decided that it would be harmful for `tvix::eval::Value` to
implement `Eq`, `Hash`, or any of the other `std` traits.
Implementing `std` traits on `Value` was deemed harmful because IFD
can cause arbitrary amounts of compilation to occur, including
network transactions with builders. Obviously it would be
unexpected and error-prone to have a `PartialEq::eq()` which does
something like this. This problem does not manifest within the
"nixpkgs compatibility only" scope, or in any undeprecated language
feature other than IFD. Although IFD is outside the "nixpkgs
compatibility scope", it [has been added to the TVL compatibility
scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/).
This was the sole reason for not merging.
The explanation below may be useful in case future circumstances
affect the relevance of the reasoning above.
The implementation can be found in these CLs:
- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194)
- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195)
- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196)
- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197)
# Thread-local storage for tvix::eval::vm::VM
## The problem
`Value::force()` takes a `&mut VM` argument, since forcing a value
requires executing opcodes. This means that `Value::nix_eq()` too
must take a `&mut VM`, since any sensible definition of equality
will have to force thunks.
Unfortunately Rust's `PartialEq::eq()` function does not accept any
additional arguments like this, so `Value` cannot implement
`PartialEq`. Worse, structs which *contain* `Value`s can't
implement `PartialEq` either. This means `Value`, and anything
containing it, cannot be the key for a `BTreeMap` or `HashMap`. We
can't even insert `Value`s into a `HashSet`!
There are other situations like this that don't involve `PartialEq`,
but it's the most glaring one. The main problem is that you need a
`VM` in order to force thunks, and thunks can be anywhere in a
`Value`.
## Solving the problem with thread-locals
We could avoid threading the `&mut VM` through the entire codebase
by making it a thread-local.
To do this without a performance hit, we need to use LLVM
thread-locals, which are the same cost as references to `static`s
but load relative to
[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of
relative to the data segment. Unfortunately `#[thread_local]` [is
unstable][thread-local-unstable] and [unsafe in
general][thread-local-unsafe] for most of the cases where we would
want to use it. There is one [exception][tls-const-init], however:
if a `!thread_local()` has a `const` initializer, the compiler will
insert a `#[thread_local]`; this special case is both safe and
stable.
The difficult decision is what the type of the thread-local should
be. Since you can't get a mutable reference to a `thread_local!()`
it will have to be some interior-mutability-bestowing wrapper around
our current `struct VM`. Here are the choices:
### `RefCell<VM>`
This is the obvious first choice, since it lets you borrow a
`RefMut<Target=VM>`. The problem here is that we want to keep the
codebase written such that all the functions in `impl VM` still take
a `&mut self`. This means that there will be an active mutable
borrow for the duration of `VM::call_builtin()`. So if we implement
`PartialEq` by having `eq()` attempt a second mutable borrow from
the thread-local storage, it will fail since there is already an
active borrow.
The problem here is that you can't "unborrow" a `RefMut` except by
dropping it. There's no way around this.
#### Problem: Uglification
The only solution here is to rewrite all the functions in `impl VM`
so they don't take any kind of `self` argument, and then have them
do a short-lived `.borrow_mut()` from the thread-local `RefCell`
*separately, each time* they want to modify one of the fields of
`VM` (currently `frames`, `stack`, `with_stack`, `warnings`). This
means that if you had a code sequence like this:
```
impl VM {
fn foo(&mut self, ...) {
...
self.frame().ip += 1;
self.some_other_method();
self.frame().ip += 1;
```
You would need to add *two separate `borrow_mut()`s*, one for each
of the `self.frame().ip+=1` statements. You can't just do one big
`borrow_mut()` because `some_other_method()` will call
`borrow_mut()` and panic.
#### Problem: Performance
The `RefCell<VM>` approach also has a fairly huge performance hit,
because every single modification to any part of `VM` will require a
reference count increment/decrement, and a conditional branch based
on the check (which will never fail) that the `RefCell` isn't
already mutably borrowed. It will also impede a lot of rustc's
optimizations.
### `Cell<VM>`
This is a non-starter because it means that in order to mutate any
field of `VM`, you have to move the entire `struct VM` out of the
`Cell`, mutate it, and move it back in.
### `Cell<Box<VM>>`
Now we're getting warmer. Here, we can move the `Box<VM>` out of
the cell with a single pointer-sized memory access.
We don't want to do the "uglification" described in the previous
section. We are very fortunate that, sometime in mid-2019, the Rust
dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T`
are bit-for-bit identical, and even gave us mortals safe wrappers
[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around
`mem::transmute()`.
So now, when a `VM` method (which takes `&mut self`) calls out to
some external code (like a builtin), instead of passing the `&mut
self` to the external code it can call `Cell::from_mut(&mut self)`,
and then `Cell::swap()` that into the thread-local storage cell for
the duration of the external code. After the external code returns,
it can `Cell::swap()` it back. This whole dance gets wrapped in a
lexical block, and the borrow checker sees that the `&Cell<Box<VM>>`
returned by `Cell::from_mut()` lives only until the end of the
lexical block, *so we get the `&mut self` back after the close-brace
for that block*. NLL FTW. This sounds like a lot of work, but it
should compile down to two pointer-sized loads and two pointer-sized
stores, and it is incurred basically only for `OpBuiltin`.
This all works, with only two issues:
1. `vm.rs` needs to be very careful to do the thread-local cell swap
dance before calling anything that might call `PartialEq::eq()`
(or any other method that expects to be able to pull the `VM` out
of thread-local storage). There is no compile-time check that we
did the dance in all the right places. If we forget to do the
dance somewhere we'll get a runtime panic from `Option::expect()`
(see next section).
2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather
than a bare `VM`, we still need to rewrite all of `vm.rs` so that
every function takes a `&mut Box<VM>` instead of a `&mut self`.
This creates a huge amount of "noise" in the code.
Fortunately, it turns out that nearly all the "noise" that arises
from the second point can be eliminated by taking advantage of
[deref coercions][deref-coercions]! This was the last "shoe to
drop".
There is still the issue of having to be careful about calls from
`vm.rs` to things outside that file, but it's manageable.
### `Cell<Option<Box<VM>>>`
In order to get the "safe and stable `#[thread_local]`"
[exception][tls-const-init] we need a `const` initializer, which
means we need to be able to put something into the `Cell` that isn't
a `VM`. So the type needs to be `Cell<Option<Box<VM>>>`.
Recall that you can't turn an `Option<&T>` into an `&Option<T>`.
The latter type has the "is this a `Some` or `None`" bit immediately
adjacent to the bits representing `T`. So if I hand you a `t:&T`
and you wrap it as `Some(t)`, those bits aren't adjacent in memory.
This means that all the VM methods need to operate on an
`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut
self` "at the last minute" before inserting it into the thread-local
storage cell. Fortunately deref coercions save the day here too --
the coercion is inferred through both layers (`Box` and `Option`) of
wrapper, so there is no additional noise in the code.
Note that Rust is clever and can find some sequence of bits that
aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`. And in fact,
`Box<T>` is one of these cases (and this is guaranteed). So the
`Option` has no overhead.
# Closing thoughts, language-level support
This would have been easier with language-level support.
## What wouldn't help
Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut
T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same
thing as a `Cell<T>`, so it wouldn't be safe for the standard
library to contain something like this:
```
impl<T> LocalKey<Cell<T>> {
fn get_mut(&self) -> &mut T {
unsafe {
// ... mem::transmute() voodoo goes here ...
```
The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and
end up with two `&mut T`s that point to the same thing (mutable aliasing) which
results in undefined behavior.
## What would help
The ideal solution is for Rust to let you call arbitrary methods
`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`. This way you can
have one (and only one) `&mut T` at any syntactical point in your
program -- the `&mut self`.
[tls-const-init]: https://github.com/rust-lang/rust/pull/90774
[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594
[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366
[fiat-decree]: https://github.com/rust-lang/rust/issues/43038
[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut
[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut
[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366]
[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods
[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic

View file

@ -0,0 +1,133 @@
Compilation of bindings
=======================
Compilation of Nix bindings is one of the most mind-bending parts of Nix
evaluation. The implementation of just the compilation is currently almost 1000
lines of code, excluding the various insane test cases we dreamt up for it.
## What is a binding?
In short, any attribute set or `let`-expression. Tvix currently does not treat
formals in function parameters (e.g. `{ name ? "fred" }: ...`) the same as these
bindings.
They have two very difficult features:
1. Keys can mutually refer to each other in `rec` sets or `let`-bindings,
including out of definition order.
2. Attribute sets can be nested, and parts of one attribute set can be defined
in multiple separate bindings.
Tvix resolves as much of this logic statically (i.e. at compile-time) as
possible, but the procedure is quite complicated.
## High-level concept
The idea behind the way we compile bindings is to fully resolve nesting
statically, and use the usual mechanisms (i.e. recursion/thunking/value
capturing) for resolving dynamic values.
This is done by compiling bindings in several phases:
1. An initial compilation phase *only* for plain inherit statements (i.e.
`inherit name;`), *not* for namespaced inherits (i.e. `inherit (from)
name;`).
2. A declaration-only phase, in which we use the compiler's scope tracking logic
to calculate the physical runtime stack indices (further referred to as
"stack slots" or just "slots") that all values will end up in.
In this phase, whenever we encounter a nested attribute set, it is merged
into a custom data structure that acts like a synthetic AST node.
This can be imagined similar to a rewrite like this:
```nix
# initial code:
{
a.b = 1;
a.c = 2;
}
# rewritten form:
{
a = {
b = 1;
c = 2;
};
}
```
The rewrite applies to attribute sets and `let`-bindings alike.
At the end of this phase, we know the stack slots of all namespaces for
inheriting from, all values inherited from them, and all values (and
optionally keys) of bindings at the current level.
Only statically known keys are actually merged, so any dynamic keys that
conflict will lead to a "key already defined" error at runtime.
3. A compilation phase, in which all values (and, when necessary, keys) are
actually compiled. In this phase the custom data structure used for merging
is encountered when compiling values.
As this data structure acts like an AST node, the process begins recursively
for each nested attribute set.
At the end of this process we have bytecode that leaves the required values (and
optionally keys) on the stack. In the case of attribute sets, a final operation
is emitted that constructs the actual attribute set structure at runtime. For
`let`-bindings a final operation is emitted that removes these locals from the
stack when the scope ends.
## Moving parts
WARNING: This documents the *current* implementation. If you only care about the
conceptual aspects, see above.
There's a few types involved:
* `PeekableAttrs`: peekable iterator over an attribute path (e.g. `a.b.c`)
* `BindingsKind`: enum defining the kind of bindings (attrs/recattrs/let)
* `AttributeSet`: struct holding the bindings kind, the AST nodes with inherits
(both namespaced and not), and an internal representation of bindings
(essentially a vector of tuples of the peekable attrs and the expression to
compile for the value).
* `Binding`: enum describing the kind of binding (namespaced inherit, attribute
set, plain binding of *any other value type*)
* `KeySlot`: enum describing the location in which a key slot is placed at
runtime (nowhere, statically known value in a slot, dynamic value in a slot)
* `TrackedBinding`: struct representing statically known information about a
single binding (its key slot, value slot and `Binding`)
* `TrackedBindings`: vector of tracked bindings, which implements logic for
merging attribute sets together
And quite a few methods on `Compiler`:
* `compile_bindings`: entry point for compiling anything that looks like a
binding, this calls out to the functions below.
* `compile_plain_inherits`: takes all inherits of a bindings node and compiles
the ones that are trivial to compile (i.e. just plain inherits without a
namespace). The `rnix` parser does not represent namespaced/plain inherits in
different nodes, so this function also aggregates the namespaced inherits and
returns them for further use
* `declare_namespaced_inherits`: passes over all namespaced inherits and
declares them on the locals stack, as well as inserts them into the provided
`TrackedBindings`
* `declare_bindings`: declares all regular key/value bindings in a bindings
scope, but without actually compiling their keys or values.
There's a lot of heavy lifting going on here:
1. It invokes the various pieces of logic responsible for merging nested
attribute sets together, creating intermediate data structures in the value
slots of bindings that can be recursively processed the same way.
2. It decides on the key slots of expressions based on the kind of bindings,
and the type of expression providing the key.
* `bind_values`: runs the actual compilation of values. Notably this function is
responsible for recursively compiling merged attribute sets when it encounters
a `Binding::Set` (on which it invokes `compile_bindings` itself).
In addition to these several methods (such as `compile_attr_set`,
`compile_let_in`, ...) invoke the binding-kind specific logic and then call out
to the functions above.

View file

@ -0,0 +1,254 @@
Build references in derivations
===============================
This document describes how build references are calculated in Tvix. Build
references are used to determine which store paths should be available to a
builder during the execution of a build (i.e. the full build closure of a
derivation).
## String contexts in C++ Nix
In C++ Nix, each string value in the evaluator carries an optional so-called
"string context".
These contexts are themselves a list of strings that take one of the following
formats:
1. `!<output_name>!<drv_path>`
This format describes a build reference to a specific output of a derivation.
2. `=<drv_path>`
This format is used for a special case where a derivation attribute directly
refers to a derivation path (e.g. by accessing `.drvPath` on a derivation).
Note: In C++ Nix this case is quite special and actually requires a
store-database query during evaluation.
3. `<path>` - a non-descript store path input, usually a plain source file (e.g.
from something like `src = ./.` or `src = ./foo.txt`).
In the case of `unsafeDiscardOutputDependency` this is used to pass a raw
derivation file, but *not* pull in its outputs.
Lets introduce names for these (in the same order) to make them easier to
reference below:
```rust
enum BuildReference {
/// !<output_name>!<drv_path>
SingleOutput(OutputName, DrvPath),
/// =<drv_path>
DrvClosure(DrvPath),
/// <path>
Path(StorePath),
}
```
String contexts are, broadly speaking, created whenever a string is the result
of a computation (e.g. string interpolation) that used a *computed* path or
derivation in any way.
Note: This explicitly does *not* include simply writing a literal string
containing a store path (whether valid or not). That is only permitted through
the `storePath` builtin.
## Derivation inputs
Based on the data above, the fields `inputDrvs` and `inputSrcs` of derivations
are populated in `builtins.derivationStrict` (the function which
`builtins.derivation`, which isn't actually a builtin, wraps).
`inputDrvs` is represented by a map of derivation paths to the set of their
outputs that were referenced by the context.
TODO: What happens if the set is empty? Somebody claimed this means all outputs.
`inputSrcs` is represented by a set of paths.
These are populated by the above references as follows:
* `SingleOutput` entries are merged into `inputDrvs`
* `Path` entries are inserted into `inputSrcs`
* `DrvClosure` leads to a special store computation (`computeFSClosure`), which
finds all paths referenced by the derivation and then inserts all of them into
the fields as above (derivations with _all_ their outputs)
This is then serialised in the derivation and passed down the pipe.
## Builtins interfacing with contexts
C++ Nix has several builtins that interface directly with string contexts:
* `unsafeDiscardStringContext`: throws away a string's string context (if
present)
* `hasContext`: returns `true`/`false` depending on whether the string has
context
* `unsafeDiscardOutputDependency`: drops dependencies on the *outputs* of a
`.drv` in the context, passing only the literal `.drv` itself
Note: This is only used for special test-cases in nixpkgs, and deprecated Nix
commands like `nix-push`.
* `getContext`: returns the string context in serialised form as a Nix attribute
set
* `appendContext`: adds a given string context to the string in the same format
as returned by `getContext`
Most of the string manipulation operations will propagate the context to the
result based on their parameters' contexts.
## Placeholders
C++ Nix has `builtins.placeholder`, which given the name of an output (e.g.
`out`) creates a hashed string representation of that output name. If that
string is used anywhere in input attributes, the builder will replace it with
the actual name of the corresponding output of the current derivation.
C++ Nix does not use contexts for this, it blindly creates a rewrite map of
these placeholder strings to the names of all outputs, and runs the output
replacement logic on all environment variables it creates, attribute files it
passes etc.
## Tvix & string contexts
In the past, Tvix did not track string contexts in its evaluator at all, see
the historical section for more information about that.
Tvix tracks string contexts in every `NixString` structure via a
`HashSet<BuildReference>` and offers an API to combine the references while
keeping the exact internal structure of that data private.
## Historical attempt: Persistent reference tracking
We were investigating implementing a system which allows us to drop string
contexts in favour of reference scanning derivation attributes.
This means that instead of maintaining and passing around a string context data
structure in eval, we maintain a data structure of *known paths* from the same
evaluation elsewhere in Tvix, and scan each derivation attribute against this
set of known paths when instantiating derivations.
We believed we could take the stance that the system of string contexts as
implemented in C++ Nix is likely an implementation detail that should not be
leaking to the language surface as it does now.
### Tracking "known paths"
Every time a Tvix evaluation does something that causes a store interaction, a
"known path" is created. On the language surface, this is the result of one of:
1. Path literals (e.g. `src = ./.`).
2. Calls to `builtins.derivationStrict` yielding a derivation and its output
paths.
3. Calls to `builtins.path`.
Whenever one of these occurs, some metadata that persists for the duration of
one evaluation should be created in Nix. This metadata needs to be available in
`builtins.derivationStrict`, and should be able to respond to these queries:
1. What is the set of all known paths? (used for e.g. instantiating an
Aho-Corasick type string searcher)
2. What is the _type_ of a path? (derivation path, derivation output, source
file)
3. What are the outputs of a derivation?
4. What is the derivation of an output?
These queries will need to be asked of the metadata when populating the
derivation fields.
Note: Depending on how we implement `builtins.placeholder`, it might be useful
to track created placeholders in this metadata, too.
### Context builtins
Context-reading builtins can be implemented in Tvix by adding `hasContext` and
`getContext` with the appropriate reference-scanning logic. However, we should
evaluate how these are used in nixpkgs and whether their uses can be removed.
Context-mutating builtins can be implemented by tracking their effects in the
value representation of Tvix, however we should consider not doing this at all.
`unsafeDiscardOutputDependency` should probably never be used and we should warn
or error on it.
`unsafeDiscardStringContext` is often used as a workaround for avoiding IFD in
inconvenient places (e.g. in the TVL depot pipeline generation). This is
unnecessary in Tvix. We should evaluate which other uses exist, and act on them
appropriately.
The initial danger with diverging here is that we might cause derivation hash
discrepancies between Tvix and C++ Nix, which can make initial comparisons of
derivations generated by the two systems difficult. If this occurs we need to
discuss how to approach it, but initially we will implement the mutating
builtins as no-ops.
### Why this did not work for us?
Nix has a feature to perform environmental checks of your derivation, e.g.
"these derivation outputs should not be referenced in this derivation", this was
introduced in Nix 2.2 by
https://github.com/NixOS/nix/commit/3cd15c5b1f5a8e6de87d5b7e8cc2f1326b420c88.
Unfortunately, this feature introduced a very unfortunate and critical bug: all
usage of this feature with contextful strings will actually force the
derivation to depend at least at build time on those specific paths, see
https://github.com/NixOS/nix/issues/4629.
For example, if you wanted to `disallowedReferences` to a package and you used a
derivation as a path, you would actually register that derivation as a input
derivation of that derivation.
This bug is still unfixed in Nix and it seems that fixing it would require
introducing different ways to evaluate Nix derivations to preserve the
output path calculation for Nix expressions so far.
All of this would be fine if the bug behavior was uniform in the sense that no
one tried to force-workaround it. Since Nixpkgs 23.05, due to
https://github.com/NixOS/nixpkgs/pull/211783 this is not true anymore.
If you let nixpkgs be the disjoint union of bootstrapping derivations $A$ and
`stdenv.mkDerivation`-built derivations $B$.
$A$ suffers from the bug and $B$ doesn't by the forced usage of
`unsafeDiscardStringContext` on those special checking fields.
This means that to build hash-compatible $A$ **and** $B$, we need to
distinguish $A$ and $B$. A lot of hacks could be imagined to support this
problem.
Let's assume we have a solution to that problem, it means that we are able to
detect implicitly when a set of specific fields are
`unsafeDiscardStringContext`-ed.
Thus, we could use that same trick to implement `unsafeDiscardStringContext`
entirely for all fields actually.
Now, to implement `unsafeDiscardStringContext` in the persistent reference
tracking model, you will need to store a disallowed list of strings that should
not trigger a reference when we are scanning a derivation parameters.
But assume you have something like:
```nix
derivation {
buildInputs = [
stdenv.cc
];
disallowedReferences = [ stdenv.cc ];
}
```
If you unregister naively the `stdenv.cc` reference, it will silence the fact
that it is part of the `buildInputs`, so you will observe that Nix will fail
the derivation during environmental check, but Tvix would silently force remove
that reference.
Until proven otherwise, it seems highly difficult to have the fine-grained
information to prevent reference tracking of those specific fields. It is not a
failure of the persistent reference tracking, it is an unresolved critical bug
of Nix that only nixpkgs really workarounded for `stdenv.mkDerivation`-based
derivations.

View file

@ -0,0 +1,138 @@
Nix builtins
============
Nix has a lot of built-in functions, some of which are accessible in
the global scope, and some of which are only accessible through the
global `builtins` attribute set.
This document is an attempt to track all of these builtins, but
without documenting their functionality.
See also https://nixos.org/manual/nix/stable/expressions/builtins.html
The `impl` column indicates implementation status in tvix:
- implemented: "" (empty cell)
- not yet implemented, but not blocked: `todo`
- not yet implemented, but blocked by other prerequisites:
- `store`: awaiting eval<->store api(s)
- `context`: awaiting support for string contexts
| name | global | arity | pure | impl |
|-------------------------------|--------|-------|-------|---------|
| abort | true | 1 | | |
| add | false | 2 | true | |
| addErrorContext | false | ? | | context |
| all | false | 2 | true | |
| any | false | 2 | true | |
| appendContext | false | ? | | |
| attrNames | false | 1 | true | |
| attrValues | false | | true | |
| baseNameOf | true | | | |
| bitAnd | false | | | |
| bitOr | false | | | |
| bitXor | false | | | |
| builtins | true | | | |
| catAttrs | false | | | |
| compareVersions | false | | | |
| concatLists | false | | | |
| concatMap | false | | | |
| concatStringsSep | false | | | |
| currentSystem | false | | | |
| currentTime | false | | false | |
| deepSeq | false | | | |
| derivation | true | | | store |
| derivationStrict | true | | | store |
| dirOf | true | | | |
| div | false | | | |
| elem | false | | | |
| elemAt | false | | | |
| false | true | | | |
| fetchGit | true | | | store |
| fetchMercurial | true | | | store |
| fetchTarball | true | | | store |
| fetchurl | false | | | store |
| filter | false | | | |
| filterSource | false | | | store |
| findFile | false | | false | todo |
| foldl' | false | | | |
| fromJSON | false | | | |
| fromTOML | true | | | |
| functionArgs | false | | | |
| genList | false | | | |
| genericClosure | false | | | todo |
| getAttr | false | | | |
| getContext | false | | | |
| getEnv | false | | false | |
| hasAttr | false | | | |
| hasContext | false | | | |
| hashFile | false | | false | |
| hashString | false | | | |
| head | false | | | |
| import | true | | | |
| intersectAttrs | false | | | |
| isAttrs | false | | | |
| isBool | false | | | |
| isFloat | false | | | |
| isFunction | false | | | |
| isInt | false | | | |
| isList | false | | | |
| isNull | true | | | |
| isPath | false | | | |
| isString | false | | | |
| langVersion | false | | | |
| length | false | | | |
| lessThan | false | | | |
| listToAttrs | false | | | |
| map | true | | | |
| mapAttrs | false | | | |
| match | false | | | |
| mul | false | | | |
| nixPath | false | | | todo |
| nixVersion | false | | | todo |
| null | true | | | |
| parseDrvName | false | | | |
| partition | false | | | |
| path | false | | sometimes | store |
| pathExists | false | | false | |
| placeholder | true | | | context |
| readDir | false | | false | |
| readFile | false | | false | |
| removeAttrs | true | | | |
| replaceStrings | false | | | |
| scopedImport | true | | | |
| seq | false | | | |
| sort | false | | | |
| split | false | | | |
| splitVersion | false | | | |
| storeDir | false | | | store |
| storePath | false | | | store |
| stringLength | false | | | |
| sub | false | | | |
| substring | false | | | |
| tail | false | | | |
| throw | true | | | |
| toFile | false | | | store |
| toJSON | false | | | |
| toPath | false | | | |
| toString | true | | | |
| toXML | true | | | |
| trace | false | | | |
| true | true | | | |
| tryEval | false | | | |
| typeOf | false | | | |
| unsafeDiscardOutputDependency | false | | | |
| unsafeDiscardStringContext | false | | | |
| unsafeGetAttrPos | false | | | todo |
| valueSize | false | | | todo |
## Added after C++ Nix 2.3 (without Flakes enabled)
| name | global | arity | pure | impl |
|---------------|--------|-------|-------|-------|
| break | false | 1 | | todo |
| ceil | false | 1 | true | |
| fetchTree | true | 1 | | todo |
| floor | false | 1 | true | |
| groupBy | false | 2 | true | |
| traceVerbose | false | 2 | | todo |
| zipAttrsWith | false | 2 | true | todo |

View file

@ -0,0 +1,131 @@
# (Possible) Implementation(s) of Catchable Errors for `builtins.tryEval`
## Terminology
Talking about “catchable errors” in Nix in general is a bit precarious since
there is no properly established terminology. Also, the existing terms are less
than apt. The reason for this lies in the fact that catchable errors (or
whatever you want to call them) don't properly _exist_ in the language: While
Nix's `builtins.tryEval` is (originally) based on the C++ exception system,
it specifically lacks the ability of such systems to have an exception _value_
whilst handling it. Consequently, these errors don't have an obvious name
as they never appear _in_ the Nix language. They just have to be named in the
respective Nix implementation:
- In C++ Nix the only term for such errors is `AssertionError` which is the
name of the (C++) exception used in the implementation internally. This
term isn't great, though, as `AssertionError`s can not only be generated
using `assert`, but also using `throw` and failed `NIX_PATH` resolutions.
Were this terminology to be used in documentation addressing Nix language
users, it would probably only serve confusion.
- Tvix currently (as of r/7573) uses the term catchable errors. This term
relates to nothing in the language as such: Errors are not caught, we rather
try to evaluate an expression. Catching also sort of implies that a value
representation of the error is attainable (like in an exception system) which
is untrue.
In light of this I (sterni) would like to suggest “tryable errors” as an
alternative term going forward which isn't inaccurate and relates to terms
already established by language internal naming.
However, this document will continue using the term catchable error until the
naming is adjusted in Tvix itself.
## Implementation
Below we discuss different implementation approaches in Tvix in order to arrive
at a proposal for the new one. The historical discussion is intended as a basis
for discussing the proposal: Are we committing to an old or current mistake? Are
we solving all problems that cropped up or were solved at any given point in
time?
### Original
The original implementation of `tryEval` in cl/6924 was quite straightforward:
It would simply interrupt the propagation of a potential catchable error to the
top level (which usually happened using the `?` operator) in the builtin and
construct the appropriate representation of an unsuccessful evaluation if the
error was deemed catchable. It had, however, multiple problems:
- The VM was originally written without `tryEval` in mind, i.e. it largely
assumed that an error would always cause execution to be terminated. This
problem was later solved (cl/6940).
- Thunks could not be `tryEval`-ed multiple times (b/281). This was another
consequence of VM architecture at the time: Thunks would be blackholed
before evaluation was started and the error could occur. Due to the
interaction of the generator-based VM code and `Value::force` the part
of the code altering the thunk state would never be informed about the
evaluation result in case of a failure, so the thunk would remain
blackholed leading to a crash if the same thunk was `tryEval`-ed or
forced again. To solve this issue, amjoseph completely overhauled
the implementation.
One key point about this implementation is that it is based on the assumption
that catchable errors can only be generated in thunks, i.e. expressions causing
them are never evaluated strictly. This can be illustrated using C++ Nix:
```console
> nix-instantiate --eval -E '[ (assert false; true) (builtins.throw "") <nixpkgs> ]'
[ <CODE> <CODE> <CODE> ]
```
If this wasn't the case, the VM could encounter the error in a situation where
the error would not have needed to pass through the `tryEval` builtin, causing
evaluation to abort.
### Present
The current system (mostly implemented in cl/9289) uses a very different
approach: Instead of relying on the thunk boundary, catchable errors are no
longer errors, but special values. They are created at the relevant points (e.g.
`builtins.throw`) and propagated whenever they are encountered by VM ops or
builtins. Finally, they either encounter `builtins.tryEval` (and are converted to
an ordinary value again) or the top level where they become a normal error again.
The problems with this mostly stem from the confusion between values and errors
that it necessitates:
- In most circumstances, catchable errors end up being errors again, as `tryEval`
is not used a lot. So `throw`s usually end up causing evaluation to abort.
Consequently, not only `Value::Catchable` is necessary, but also a corresponding
error variant that is _only_ created if a catchable value remains at the end of
evaluation. A requirement that was missed until cl/10991 (!) which illustrate
how strange that architecture is. A consequence of this is that catchable
errors have no location information at all.
- `Value::Catchable` is similar to other internal values in Tvix, but is much
more problematic. Aside from thunks, internal values only exist for a brief
amount of time on the stack and it is very clear what parts of the VM or
builtins need to handle them. This means that the rest of the implementation
need to consider them, keeping the complexity caused by the internal value
low. `Value::Catchable`, on the other hand, may exist anywhere and be passed
to any VM op or builtin, so it needs to be correctly propagated _everywhere_.
This causes a lot of noise in the code as well as a big potential for bugs.
Essentially, catchable errors require as much attention by the Tvix developer
as laziness. This doesn't really correlate to the importance of the two
features to the Nix language.
### Future?
The core assumption of the original solution does offer a path forward: After
cl/9289 we should be in a better position to introspect an error occurring from
within the VM code, but we need a better way of storing such an error to prevent
another b/281. If catchable errors can only be generated in thunks, we can just
use the thunk representation for this. This would mean that `Thunk::force_`
would need to check if evaluation was successful and (in case of failure)
change the thunk representation
- either to the original `ThunkRepr::Suspended` which would be simple, but of
course mean duplicated evaluation work in some expressions. In fact, this
would probably leave a lot of easy performance on the table for use cases we
would like to support, e.g. tree walkers for nixpkgs.
- or to a new `ThunkRepr` variant that stores the kind of the error and all
necessary location info so stack traces can work properly. This of course
reintroduces some of the difficulty of having two kinds of errors, but it is
hopefully less problematic, as the thunk boundary (i.e. `Thunk::force`) is
where errors would usually occur.
Besides the question whether this proposal can actually be implemented, another
consideration is whether the underlying assumption will hold in the future, i.e.
can we implement optimizations for thunk elimination in a way that thunks that
generate catchable errors are never eliminated?

View file

@ -0,0 +1,162 @@
Known Optimisation Potential
============================
There are several areas of the Tvix evaluator code base where
potentially large performance gains can be achieved through
optimisations that we are already aware of.
The shape of most optimisations is that of moving more work into the
compiler to simplify the runtime execution of Nix code. This leads, in
some cases, to drastically higher complexity in both the compiler
itself and in invariants that need to be guaranteed between the
runtime and the compiler.
For this reason, and because we lack the infrastructure to adequately
track their impact (WIP), we have not yet implemented these
optimisations, but note the most important ones here.
* Use "open upvalues" [hard]
Right now, Tvix will immediately close over all upvalues that are
created and clone them into the `Closure::upvalues` array.
Instead of doing this, we can statically determine most locals that
are closed over *and escape their scope* (similar to how the
`compiler::scope::Scope` struct currently tracks whether locals are
used at all).
If we implement the machinery to track this, we can implement some
upvalues at runtime by simply sticking stack indices in the upvalue
array and only copy the values where we know that they escape.
* Avoid `with` value duplication [easy]
If a `with` makes use of a local identifier in a scope that can not
close before the with (e.g. not across `LambdaCtx` boundaries), we
can avoid the allocation of the phantom value and duplication of the
`NixAttrs` value on the stack. In this case we simply push the stack
index of the known local.
* Multiple attribute selection [medium]
An instruction could be introduced that avoids repeatedly pushing an
attribute set to/from the stack if multiple keys are being selected
from it. This occurs, for example, when inheriting from an attribute
set or when binding function formals.
* Split closure/function representation [easy]
Functions have fewer fields that need to be populated at runtime and
can directly use the `value::function::Lambda` representation where
possible.
* Apply `compiler::optimise_select` to other set operations [medium]
In addition to selects, statically known attribute resolution could
also be used for things like `?` or `with`. The latter might be a
little more complicated but is worth investigating.
* Inline fully applied builtins with equivalent operators [medium]
Some `builtins` have equivalent operators, e.g. `builtins.sub`
corresponds to the `-` operator, `builtins.hasAttr` to the `?`
operator etc. These operators additionally compile to a primitive
VM opcode, so they should be just as cheap (if not cheaper) as
a builtin application.
In case the compiler encounters a fully applied builtin (i.e.
no currying is occurring) and the `builtins` global is unshadowed,
it could compile the equivalent operator bytecode instead: For
example, `builtins.sub 20 22` would be compiled as `20 - 22`.
This would ensure that equivalent `builtins` can also benefit
from special optimisations we may implement for certain operators
(in the absence of currying). E.g. we could optimise access
to the `builtins` attribute set which a call to
`builtins.getAttr "foo" builtins` should also profit from.
* Avoid nested `VM::run` calls [hard]
Currently when encountering Nix-native callables (thunks, closures)
the VM's run loop will nest and return the value of the nested call
frame one level up. This makes the Rust call stack almost mirror the
Nix call stack, which is usually undesirable.
It is possible to detect situations where this is avoidable and
instead set up the VM in such a way that it continues and produces
the desired result in the same run loop, but this is kind of tricky
to get right - especially while other parts are still in flux.
For details consult the commit with Gerrit change ID
`I96828ab6a628136e0bac1bf03555faa4e6b74ece`, in which the initial
attempt at doing this was reverted.
* Avoid thunks if only identifier closing is required [medium]
Some constructs, like `with`, mostly do not change runtime behaviour
if thunked. However, they are wrapped in thunks to ensure that
deferred identifiers are resolved correctly.
This can be avoided, as we statically analyse the scope and should
be able to tell whether any such logic was required.
* Intern literals [easy]
Currently, the compiler emits a separate entry in the constant
table for each literal. So the program `1 + 1 + 1` will have
three entries in its `Chunk::constants` instead of only one.
* Do some list and attribute set operations in place [hard]
Algorithms that can not do a lot of work inside `builtins` like `map`,
`filter` or `foldl'` usually perform terribly if they use data structures like
lists and attribute sets.
`builtins` can do work in place on a copy of a `Value`, but naïvely expressed
recursive algorithms will usually use `//` and `++` to do a single change to a
`Value` at a time, requiring a full copy of the data structure each time.
It would be a big improvement if we could do some of these operations in place
without requiring a new copy.
There are probably two approaches: We could determine statically if a value is
reachable from elsewhere and emit a special in place instruction if not. An
easier alternative is probably to rely on reference counting at runtime: If no
other reference to a value exists, we can extend the list or update the
attribute set in place.
An **alternative** to this is using [persistent data
structures](https://en.wikipedia.org/wiki/Persistent_data_structure) or at the
very least [immutable data structures](https://docs.rs/im/latest/im/) that can
be copied more efficiently than the stock structures we are using at the
moment.
* Skip finalising unfinalised thunks or non-thunks instead of crashing [easy]
Currently `OpFinalise` crashes the VM if it is called on values that don't
need to be finalised. This helps catching miscompilations where `OpFinalise`
operates on the wrong `StackIdx`. In the case of function argument patterns,
however, this means extra VM stack and instruction overhead for dynamically
determining if finalisation is necessary or not. This wouldn't be necessary
if `OpFinalise` would just noop on any values that don't need to be finalised
(anymore).
* Phantom binding for from expression of inherits [easy]
The from expression of an inherit is reevaluated for each inherit. This can
be demonstrated using the following Nix expression which, counter-intuitively,
will print “plonk” twice.
```nix
let
inherit (builtins.trace "plonk" { a = null; b = null; }) a b;
in
builtins.seq a (builtins.seq b null)
```
In most Nix code, the from expression is just an identifier, so it is not
terribly inefficient, but in some cases a more expensive expression may
be used. We should create a phantom binding for the from expression that
is reused in the inherits, so only a single thunk is created for the from
expression.
Since we discovered this, C++ Nix has implemented a similar optimization:
<https://github.com/NixOS/nix/pull/9847>.

View file

@ -0,0 +1,46 @@
# Nix language issues
In the absence of a language standard, what Nix (the language) is, is prescribed
by the behavior of the C++ Nix implementation. Still, there are reasons not to
accept some behavior:
* Tvix aims for nixpkgs compatibility only. This means we can ignore behavior in
edge cases nixpkgs doesn't trigger as well as obscure features it doesn't use
(e.g. `__overrides`).
* Some behavior of the Nix evaluator seems to be unintentional or an
implementation detail leaking out into language behavior.
Especially in the latter case, it makes sense to raise the respective issue and
maybe to get rid of the behavior in all implementations for good. Below is an
(incomplete) list of such issues:
* [Behaviour of nested attribute sets depends on definition order][i7111]
* [Partially constructed attribute sets are observable during dynamic attr names construction][i7012]
* [Nix parsers merges multiple attribute set literals for the same key incorrectly depending on definition order][i7115]
On the other hand, there is behavior that seems to violate one's expectation
about the language at first, but has good enough reasons from an implementor's
perspective to keep them:
* Dynamic keys are forbidden in `let` and `inherit`. This makes sure that we
only need to do runtime identifier lookups for `with`. More dynamic (i.e.
runtime) lookups would make the scoping system even more complicated as well
as hurt performance.
* Dynamic attributes of `rec` sets are not added to its scope. This makes sense
for the same reason.
* Dynamic and nested attributes in attribute sets don't get merged. This is a
tricky one, but avoids doing runtime (recursive) merges of attribute sets.
Instead all necessary merging can be inferred statically, i.e. the C++ Nix
implementation already merges at parse time, making nested attribute keys
syntactic sugar effectively.
Other behavior is just odd, surprising or underdocumented:
* `builtins.foldl'` doesn't force the initial accumulator (but all other
intermediate accumulator values), differing from e.g. Haskell, see
the [relevant PR discussion][p7158].
[i7111]: https://github.com/NixOS/nix/issues/7111
[i7012]: https://github.com/NixOS/nix/issues/7012
[i7115]: https://github.com/NixOS/nix/issues/7115
[p7158]: https://github.com/NixOS/nix/pull/7158

View file

@ -0,0 +1,122 @@
# attrset-opcodes
The problem with attrset literals is twofold:
1. The keys of attribute sets may be dynamically evaluated.
Access:
```nix
let
k = "foo";
attrs = { /* etc. */ };
in attrs."${k}"
```
Literal:
```nix
let
k = "foo";
in {
"${k}" = 42;
}
```
The problem with this is that the attribute set key is not known at
compile time, and needs to be dynamically evaluated by the VM as an
expression.
For the most part this should be pretty simple, assuming a
theoretical instruction set:
```
0000 OP_CONSTANT(0) # key "foo"
0001 OP_CONSTANT(1) # value 42
0002 OP_ATTR_SET(1) # construct attrset from 2 stack values
```
The operation pushing the key needs to be replaced with one that
leaves a single value (the key) on the stack, i.e. the code for the
expression, e.g.:
```
0000..000n <operations leaving a string value on the stack>
000n+1 OP_CONSTANT(1) # value 42
000n+2 OP_ATTR_SET(1) # construct attrset from 2 stack values
```
This is fairly easy to do by simply recursing in the compiler when
the key expression is encountered.
2. The keys of attribute sets may be nested.
This is the non-trivial part of dealing with attribute set
literals. Specifically, the nesting can be arbitrarily deep and the
AST does not guarantee that related set keys are located
adjacently.
Furthermore, this frequently occurs in practice in Nix. We need a
bytecode representation that makes it possible to construct nested
attribute sets at runtime.
Proposal: AttrPath values
If we can leave a value representing an attribute path on the
stack, we can offload the construction of nested attribute sets to
the `OpAttrSet` operation.
Under the hood, OpAttrSet in practice constructs a `Map<NixString,
Value>` attribute set in most cases. This means it expects to pop
the value of the key of the stack, but is otherwise free to do
whatever it wants with the underlying map.
In a simple example, we could have code like this:
```nix
{
a.b = 15;
}
```
This would be compiled to a new `OpAttrPath` instruction that
constructs and pushes an attribute path from a given number of
fragments (which are popped off the stack).
For example,
```
0000 OP_CONSTANT(0) # key "a"
0001 OP_CONSTANT(1) # key "b"
0002 OP_ATTR_PATH(2) # construct attrpath from 2 fragments
0003 OP_CONSTANT(2) # value 42
0004 OP_ATTRS(1) # construct attrset from one pair
```
Right before `0004` the stack would be left like this:
[ AttrPath[a,b], 42 ]
Inside of the `OP_ATTRS` instruction we could then begin
construction of the map and insert the nested attribute sets as
required, as well as validate that there are no duplicate keys.
3. Both of these cases can occur simultaneously, but this is not a
problem as the opcodes combine perfectly fine, e.g.:
```nix
let
k = "a";
in {
"${k}".b = 42;
}
```
results in
```
0000..000n <operations leaving a string value on the stack>
000n+1 OP_CONSTANT(1) # key "b"
000n+2 OP_ATTR_PATH(2) # construct attrpath from 2 fragments
000n+3 OP_CONSTANT(2) # value 42
000n+4 OP_ATTR_SET(1) # construct attrset from 2 stack values
```

View file

@ -0,0 +1,68 @@
Recursive attribute sets
========================
The construction behaviour of recursive attribute sets is very
specific, and a bit peculiar.
In essence, there are multiple "phases" of scoping that take place
during attribute set construction:
1. Every inherited value without an explicit source is inherited only
from the **outer** scope in which the attribute set is enclosed.
2. A new scope is opened in which all recursive keys are evaluated.
This only considers **statically known keys**, attributes can
**not** recurse into dynamic keys in `self`!
For example, this code is invalid in C++ Nix:
```
nix-repl> rec { ${"a"+""} = 2; b = a * 10; }
error: undefined variable 'a' at (string):1:26
```
3. Finally, a third scope is opened in which dynamic keys are
evaluated.
This behaviour, while possibly a bit strange and unexpected, actually
simplifies the implementation of recursive attribute sets in Tvix as
well.
Essentially, a recursive attribute set like this:
```nix
rec {
inherit a;
b = a * 10;
${"c" + ""} = b * 2;
}
```
Can be compiled like the following expression:
```nix
let
inherit a;
in let
b = a * 10;
in {
inherit a b;
${"c" + ""} = b * 2;
}
```
Completely deferring the resolution of recursive identifiers to the
existing handling of recursive scopes (i.e. deferred access) in let
bindings.
In practice, we can further specialise this and compile each scope
directly into the form expected by `OpAttrs` (that is, leaving
attribute names on the stack) before each value's position.
C++ Nix's Implementation
------------------------
* [`ExprAttrs`](https://github.com/NixOS/nix/blob/2097c30b08af19a9b42705fbc07463bea60dfb5b/src/libexpr/nixexpr.hh#L241-L268)
(AST representation of attribute sets)
* [`ExprAttrs::eval`](https://github.com/NixOS/nix/blob/075bf6e5565aff9fba0ea02f3333c82adf4dccee/src/libexpr/eval.cc#L1333-L1414)
* [`addAttr`](https://github.com/NixOS/nix/blob/master/src/libexpr/parser.y#L98-L156) (`ExprAttrs` construction in the parser)

View file

@ -0,0 +1,315 @@
tvix-eval VM loop
=================
This document describes the new tvix-eval VM execution loop implemented in the
chain focusing around cl/8104.
## Background
The VM loop implemented in Tvix prior to cl/8104 had several functions:
1. Advancing the instruction pointer for a chunk of Tvix bytecode and
executing instructions in a loop until a result was yielded.
2. Tracking Nix call frames as functions/thunks were entered/exited.
3. Catching trampoline requests returned from instructions to force suspended
thunks without increasing stack size *where possible*.
4. Handling trampolines through an inner trampoline loop, switching between a
code execution mode and execution of subsequent trampolines.
This implementation of the trampoline logic was added on to the existing VM,
which previously always recursed for thunk forcing. There are some cases (for
example values that need to be forced *inside* of the execution of a builtin)
where trampolines could not previously be used, and the VM recursed anyways.
As a result of this trampoline logic being added "on top" of the existing VM
loop the code became quite difficult to understand. This led to several bugs,
for example: b/251, b/246, b/245, and b/238.
These bugs were tricky to deal with, as we had to try and make the VM do
things that are somewhat difficult to fit into its model. We could of course
keep extending the trampoline logic to accommodate all sorts of concepts (such
as finalisers), but that seems like it does not solve the root problem.
## New VM loop
In cl/8104, a unified new solution is implemented with which the VM is capable
of evaluating everything without increasing the call stack size.
This is done by introducing a new frame stack in the VM, on which execution
frames are enqueued that are either:
1. A bytecode frame, consisting of Tvix bytecode that evaluates compiled Nix
code.
2. A generator frame, consisting of some VM logic implemented in pure Rust
code that can be *suspended* when it hits a point where the VM would
previously need to recurse.
We do this by making use of the `async` *keyword* in Rust, but notably
*without* introducing asynchronous I/O or concurrency in tvix-eval (the
complexity of which is currently undesirable for us).
Specifically, when writing a Rust function that uses the `async` keyword, such
as:
```rust
async fn some_builtin(input: Value) -> Result<Value, ErrorKind> {
let mut out = NixList::new();
for element in input.to_list()? {
let result = do_something_that_requires_the_vm(element).await;
out.push(result);
}
Ok(out)
}
```
The compiler actually generates a state-machine under-the-hood which allows
the execution of that function to be *suspended* whenever it hits an `await`.
We use the [`genawaiter`][] crate that gives us a data structure and simple
interface for getting instances of these state machines that can be stored in
a struct (in our case, a *generator frame*).
The execution of the VM then becomes the execution of an *outer loop*, which
is responsible for selecting the next generator frame to execute, and two
*inner loops*, which drive the execution of a bytecode frame or generator
frame forward until it either yields a value or asks to be suspended in favour
of another frame.
All "communication" between frames happens solely through values left on the
stack: Whenever a frame of either type runs to completion, it is expected to
leave a *single* value on the stack. It follows that the whole VM, upon
completion of the last (or initial, depending on your perspective) frame
yields its result as the return value.
The core of the VM restructuring is cl/8104, unfortunately one of the largest
single commit changes we've had to make yet, as it touches pretty much all
areas of tvix-eval. The introduction of the generators and the
message/response system we built to request something from the VM, suspend a
generator, and wait for the return is in cl/8148.
The next sections describe in detail how the three different loops work.
### Outer VM loop
The outer VM loop is responsible for selecting the next frame to run, and
dispatching it correctly to inner loops, as well as determining when to shut
down the VM and return the final result.
```
╭──────────────────╮
╭────────┤ match frame kind ├──────╮
│ ╰──────────────────╯ │
│ │
┏━━━━━━━━━━━━┷━━━━━┓ ╭───────────┴───────────╮
───►┃ frame_stack.pop()┃ ▼ ▼
┗━━━━━━━━━━━━━━━━━━┛ ┏━━━━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━┓
▲ ┃ bytecode frame ┃ ┃ generator frame ┃
│ ┗━━━━━━━━┯━━━━━━━┛ ┗━━━━━━━━┯━━━━━━━━┛
│[yes, cont.] │ │
│ ▼ ▼
┏━━━━━━━━┓ │ ╔════════════════╗ ╔═════════════════╗
◄───┨ return ┃ │ ║ inner bytecode ║ ║ inner generator ║
┗━━━━━━━━┛ │ ║ loop ║ ║ loop ║
▲ │ ╚════════╤═══════╝ ╚════════╤════════╝
│ ╭────┴─────╮ │ │
│ │ has next │ ╰───────────┬───────────╯
[no] ╰───┤ frame? │ │
╰────┬─────╯ ▼
│ ┏━━━━━━━━━━━━━━━━━┓
│ ┃ frame completed ┃
╰─────────────────────────┨ or suspended ┃
┗━━━━━━━━━━━━━━━━━┛
```
Initially, the VM always pops a frame from the frame stack and then inspects
the type of frame it found. As a consequence the next frame to execute is
always the frame at the top of the stack, and setting up a VM initially for
code execution is done by leaving a bytecode frame with the code to execute on
the stack and passing control to the outer loop.
Control is dispatched to either of the inner loops (depending on the type of
frame) and the cycle continues once they return.
When an inner loop returns, it has either finished its execution (and left its
result value on the *value stack*), or its frame has requested to be
suspended.
Frames request suspension by re-enqueueing *themselves* through VM helper
methods, and then leaving the frame they want to run *on top* of themselves in
the frame stack before yielding control back to the outer loop.
The inner control loops inform the outer loops about whether the frame has
been *completed* or *suspended* by returning a boolean.
### Inner bytecode loop
The inner bytecode loop drives the execution of some Tvix bytecode by
continously looking at the next instruction to execute, and dispatching to the
instruction handler.
```
┏━━━━━━━━━━━━━┓
◄──┨ return true ┃
┗━━━━━━━━━━━━━┛
╔════╧═════╗
║ OpReturn ║
╚══════════╝
╰──┬────────────────────────────╮
│ ▼
│ ╔═════════════════════╗
┏━━━━━━━━┷━━━━━┓ ║ execute instruction ║
───►┃ inspect next ┃ ╚══════════╤══════════╝
┃ instruction ┃ │
┗━━━━━━━━━━━━━━┛ │
▲ ╭─────┴─────╮
╰──────────────────────┤ suspends? │
[no] ╰─────┬─────╯
┏━━━━━━━━━━━━━━┓ │
◄──┨ return false ┃───────────────────────╯
┗━━━━━━━━━━━━━━┛ [yes]
```
With this refactoring, the compiler now emits a special `OpReturn` instruction
at the end of bytecode chunks. This is a signal to the runtime that the chunk
has completed and that its current value should be returned, without having to
perform instruction pointer arithmetic.
When `OpReturn` is encountered, the inner bytecode loop returns control to the
outer loop and informs it (by returning `true`) that the bytecode frame has
completed.
Any other instruction may also request a suspension of the bytecode frame (for
example, instructions that need to force a value). In this case the inner loop
is responsible for setting up the frame stack correctly, and returning `false`
to inform the outer loop of the suspension
### Inner generator loop
The inner generator loop is responsible for driving the execution of a
generator frame by continously calling [`Gen::resume`][] until it requests a
suspension (as a result of which control is returned to the outer loop), or
until the generator is done and yields a value.
```
┏━━━━━━━━━━━━━┓
◄──┨ return true ┃ ◄───────────────────╮
┗━━━━━━━━━━━━━┛ │
[Done] │
╭──────────────────┴─────────╮
│ inspect generator response │◄────────────╮
╰──────────────────┬─────────╯ │
[yielded] │ ┏━━━━━━━━┷━━━━━━━━┓
│ ┃ gen.resume(msg) ┃◄──
▼ ┗━━━━━━━━━━━━━━━━━┛
╭────────────╮ ▲
│ same-frame │ │
│ request? ├────────────────╯
╰─────┬──────╯ [yes]
┏━━━━━━━━━━━━━━┓ │
◄──┨ return false ┃ ◄──────────────────╯
┗━━━━━━━━━━━━━━┛ [no]
```
On each execution of a generator frame, `resume_with` is called with a
[`VMResponse`][] (i.e. a message *from* the VM *to* the generator). For a newly
created generator, the initial message is just `Empty`.
A generator may then respond by signaling that it has finished execution
(`Done`), in which case the inner generator loop returns control to the outer
loop and informs it that this generator is done (by returning `true`).
A generator may also respond by signaling that it needs some data from the VM.
This is implemented through a request-response pattern, in which the generator
returns a `Yielded` message containing a [`VMRequest`][]. These requests can be
very simple ("Tell me the current store path") or more complex ("Call this Nix
function with these values").
Requests are divided into two classes: Same-frame requests (requests that can be
responded to *without* returning control to the outer loop, i.e. without
executing a *different* frame), and multi-frame generator requests. Based on the
type of request, the inner generator loop will either handle it right away and
send the response in a new `resume_with` call, or return `false` to the outer
generator loop after setting up the frame stack.
Most of this logic is implemented in cl/8148.
[`Gen::resume`]: https://docs.rs/genawaiter/0.99.1/genawaiter/rc/struct.Gen.html#method.resume_with
[`VMRequest`]: https://cs.tvl.fyi/depot@2696839770c1ccb62929ff2575a633c07f5c9593/-/blob/tvix/eval/src/vm/generators.rs?L44
[`VMResponse`]: https://cs.tvl.fyi/depot@2696839770c1ccb62929ff2575a633c07f5c9593/-/blob/tvix/eval/src/vm/generators.rs?L169
## Advantages & Disadvantages of the approach
This approach has several advantages:
* The execution model is much simpler than before, making it fairly
straightforward to build up a mental model of what the VM does.
* All "out of band requests" inside the VM are handled through the same
abstraction (generators).
* Implementation is not difficult, albeit a little verbose in some cases (we
can argue about whether or not to introduce macros for simplifying it).
* Several parts of the VM execution are now much easier to document,
potentially letting us onboard tvix-eval contributors faster.
* The linear VM execution itself is much easier to trace now, with for example
the `RuntimeObserver` (and by extension `tvixbolt`) giving much clearer
output now.
But it also comes with some disadvantages:
* Even though we "only" use the `async` keyword without a full async-I/O
runtime, we still encounter many of the drawbacks of the fragmented Rust
async ecosystem.
The biggest issue with this is that parts of the standard library become
unavailable to us, for example the built-in `Vec::sort_by` can no longer be
used for sorting in Nix because our comparators themselves are `async`.
This led us to having to implement some logic on our own, as the design of
`async` in Rust even makes it difficult to provide usecase-generic
implementations of concepts like sorting.
* We need to allocate quite a few new structures on the heap in order to drive
generators, as generators involve storing `Future` types (with unknown
sizes) inside of structs.
In initial testing this seems to make no significant difference in
performance (our performance in an actual nixpkgs-eval is still bottlenecked
by I/O concerns and reference scanning), but is something to keep in mind
later on when we start optimising more after the low-hanging fruits have
been reaped.
## Alternatives considered
1. Tacking on more functionality onto the existing VM loop
implementation to accomodate problems as they show up. This is not
preferred as the code is already getting messy.
2. Making tvix-eval a fully `async` project, pulling in something like Tokio
or `async-std` as a runtime. This is not preferred due to the massively
increased complexity of those solutions, and all the known issues of fully
buying in to the async ecosystem.
tvix-eval fundamentally should work for use-cases besides building Nix
packages (e.g. for `//tvix/serde`), and its profile should be as slim as
possible.
3. Convincing the Rust developers that Rust needs a way to guarantee
constant-stack-depth tail calls through something like a `tailcall`
keyword.
4. ... ?
[`genawaiter`]: https://docs.rs/genawaiter/