chore(tvix/eval): move eval docs to tvix/docs

Change-Id: I75b33c43456389de6e521b4f0ad46d68bc9e98f6 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11809 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
2024-06-13 22:17:18 +03:00 · 2024-06-13 22:17:18 +03:00 · 5077ca70de
commit 5077ca70de
parent 6947dc4349
12 changed files with 16 additions and 0 deletions
--- a/tvix/docs/src/SUMMARY.md
+++ b/tvix/docs/src/SUMMARY.md
@ -4,6 +4,19 @@
 - [Architecture & data flow](./architecture.md)
 - [TODOs](./TODO.md)

+# Evaluator
+- [Compilation of Bindings](./eval/bindings.md)
+- [Builtins](./eval/builtins.md)
+- [Build References](./eval/build-references.md)
+- [Catchable Errors](./eval/catchable-errors.md)
+- [Known Optimisation Potential](./eval/known-optimisation-potential.md)
+- [Langugage Issues](./eval/language-issues.md)
+- [Attrset Opcodes](./eval/opcodes-attrsets.md)
+- [Recursive attribute sets](./eval/recursive-attrs.md)
+- [VM Loop](./eval/vm-loop.md)
+- [Abandoned](./eval/abandoned/index.md)
+  - [Thread-local VM](./eval/abandoned/thread-local-vm.md)
+
 # Store
 - [Store API](./store/api.md)
 - [BlobStore Chunking](./castore/blobstore-chunking.md)
--- a/tvix/docs/src/eval/abandoned/index.md
+++ b/tvix/docs/src/eval/abandoned/index.md
@ -0,0 +1,3 @@
+# Abandoned ideas
+
+This chapter keeps track of abandoned ideas, and why they were abandoned.
--- a/tvix/docs/src/eval/abandoned/thread-local-vm.md
+++ b/tvix/docs/src/eval/abandoned/thread-local-vm.md
@ -0,0 +1,233 @@
+# We can't have nice things because IFD
+
+The thread-local VM work below was ultimately not merged because it
+was decided that it would be harmful for `tvix::eval::Value` to
+implement `Eq`, `Hash`, or any of the other `std` traits.
+
+Implementing `std` traits on `Value` was deemed harmful because IFD
+can cause arbitrary amounts of compilation to occur, including
+network transactions with builders.  Obviously it would be
+unexpected and error-prone to have a `PartialEq::eq()` which does
+something like this.  This problem does not manifest within the
+"nixpkgs compatibility only" scope, or in any undeprecated language
+feature other than IFD.  Although IFD is outside the "nixpkgs
+compatibility scope", it [has been added to the TVL compatibility
+scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/).
+
+This was the sole reason for not merging.
+
+The explanation below may be useful in case future circumstances
+affect the relevance of the reasoning above.
+
+The implementation can be found in these CLs:
+
+- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194)
+- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195)
+- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196)
+- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197)
+
+# Thread-local storage for tvix::eval::vm::VM
+
+## The problem
+
+`Value::force()` takes a `&mut VM` argument, since forcing a value
+requires executing opcodes.  This means that `Value::nix_eq()` too
+must take a `&mut VM`, since any sensible definition of equality
+will have to force thunks.
+
+Unfortunately Rust's `PartialEq::eq()` function does not accept any
+additional arguments like this, so `Value` cannot implement
+`PartialEq`.  Worse, structs which *contain* `Value`s can't
+implement `PartialEq` either.  This means `Value`, and anything
+containing it, cannot be the key for a `BTreeMap` or `HashMap`.  We
+can't even insert `Value`s into a `HashSet`!
+
+There are other situations like this that don't involve `PartialEq`,
+but it's the most glaring one.  The main problem is that you need a
+`VM` in order to force thunks, and thunks can be anywhere in a
+`Value`.
+
+## Solving the problem with thread-locals
+
+We could avoid threading the `&mut VM` through the entire codebase
+by making it a thread-local.
+
+To do this without a performance hit, we need to use LLVM
+thread-locals, which are the same cost as references to `static`s
+but load relative to
+[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of
+relative to the data segment.  Unfortunately `#[thread_local]` [is
+unstable][thread-local-unstable] and [unsafe in
+general][thread-local-unsafe] for most of the cases where we would
+want to use it.  There is one [exception][tls-const-init], however:
+if a `!thread_local()` has a `const` initializer, the compiler will
+insert a `#[thread_local]`; this special case is both safe and
+stable.
+
+The difficult decision is what the type of the thread-local should
+be.  Since you can't get a mutable reference to a `thread_local!()`
+it will have to be some interior-mutability-bestowing wrapper around
+our current `struct VM`.  Here are the choices:
+
+### `RefCell<VM>`
+
+This is the obvious first choice, since it lets you borrow a
+`RefMut<Target=VM>`.  The problem here is that we want to keep the
+codebase written such that all the functions in `impl VM` still take
+a `&mut self`.  This means that there will be an active mutable
+borrow for the duration of `VM::call_builtin()`.  So if we implement
+`PartialEq` by having `eq()` attempt a second mutable borrow from
+the thread-local storage, it will fail since there is already an
+active borrow.
+
+The problem here is that you can't "unborrow" a `RefMut` except by
+dropping it.  There's no way around this.
+
+#### Problem: Uglification
+
+The only solution here is to rewrite all the functions in `impl VM`
+so they don't take any kind of `self` argument, and then have them
+do a short-lived `.borrow_mut()` from the thread-local `RefCell`
+*separately, each time* they want to modify one of the fields of
+`VM` (currently `frames`, `stack`, `with_stack`, `warnings`).  This
+means that if you had a code sequence like this:
+
+```
+impl VM {
+  fn foo(&mut self, ...) {
+    ...
+    self.frame().ip += 1;
+    self.some_other_method();
+    self.frame().ip += 1;
+```
+
+You would need to add *two separate `borrow_mut()`s*, one for each
+of the `self.frame().ip+=1` statements.  You can't just do one big
+`borrow_mut()` because `some_other_method()` will call
+`borrow_mut()` and panic.
+
+#### Problem: Performance
+
+The `RefCell<VM>` approach also has a fairly huge performance hit,
+because every single modification to any part of `VM` will require a
+reference count increment/decrement, and a conditional branch based
+on the check (which will never fail) that the `RefCell` isn't
+already mutably borrowed.  It will also impede a lot of rustc's
+optimizations.
+
+### `Cell<VM>`
+
+This is a non-starter because it means that in order to mutate any
+field of `VM`, you have to move the entire `struct VM` out of the
+`Cell`, mutate it, and move it back in.
+
+### `Cell<Box<VM>>`
+
+Now we're getting warmer.  Here, we can move the `Box<VM>` out of
+the cell with a single pointer-sized memory access.
+
+We don't want to do the "uglification" described in the previous
+section.  We are very fortunate that, sometime in mid-2019, the Rust
+dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T`
+are bit-for-bit identical, and even gave us mortals safe wrappers
+[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around
+`mem::transmute()`.
+
+So now, when a `VM` method (which takes `&mut self`) calls out to
+some external code (like a builtin), instead of passing the `&mut
+self` to the external code it can call `Cell::from_mut(&mut self)`,
+and then `Cell::swap()` that into the thread-local storage cell for
+the duration of the external code.  After the external code returns,
+it can `Cell::swap()` it back.  This whole dance gets wrapped in a
+lexical block, and the borrow checker sees that the `&Cell<Box<VM>>`
+returned by `Cell::from_mut()` lives only until the end of the
+lexical block, *so we get the `&mut self` back after the close-brace
+for that block*.  NLL FTW.  This sounds like a lot of work, but it
+should compile down to two pointer-sized loads and two pointer-sized
+stores, and it is incurred basically only for `OpBuiltin`.
+
+This all works, with only two issues:
+
+1. `vm.rs` needs to be very careful to do the thread-local cell swap
+   dance before calling anything that might call `PartialEq::eq()`
+   (or any other method that expects to be able to pull the `VM` out
+   of thread-local storage).  There is no compile-time check that we
+   did the dance in all the right places.  If we forget to do the
+   dance somewhere we'll get a runtime panic from `Option::expect()`
+   (see next section).
+
+2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather
+   than a bare `VM`, we still need to rewrite all of `vm.rs` so that
+   every function takes a `&mut Box<VM>` instead of a `&mut self`.
+   This creates a huge amount of "noise" in the code.
+
+Fortunately, it turns out that nearly all the "noise" that arises
+from the second point can be eliminated by taking advantage of
+[deref coercions][deref-coercions]!  This was the last "shoe to
+drop".
+
+There is still the issue of having to be careful about calls from
+`vm.rs` to things outside that file, but it's manageable.
+
+### `Cell<Option<Box<VM>>>`
+
+In order to get the "safe and stable `#[thread_local]`"
+[exception][tls-const-init] we need a `const` initializer, which
+means we need to be able to put something into the `Cell` that isn't
+a `VM`.  So the type needs to be `Cell<Option<Box<VM>>>`.
+
+Recall that you can't turn an `Option<&T>` into an `&Option<T>`.
+The latter type has the "is this a `Some` or `None`" bit immediately
+adjacent to the bits representing `T`.  So if I hand you a `t:&T`
+and you wrap it as `Some(t)`, those bits aren't adjacent in memory.
+This means that all the VM methods need to operate on an
+`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut
+self` "at the last minute" before inserting it into the thread-local
+storage cell.  Fortunately deref coercions save the day here too --
+the coercion is inferred through both layers (`Box` and `Option`) of
+wrapper, so there is no additional noise in the code.
+
+Note that Rust is clever and can find some sequence of bits that
+aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`.  And in fact,
+`Box<T>` is one of these cases (and this is guaranteed).  So the
+`Option` has no overhead.
+
+# Closing thoughts, language-level support
+
+This would have been easier with language-level support.
+
+## What wouldn't help
+
+Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut
+T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same
+thing as a `Cell<T>`, so it wouldn't be safe for the standard
+library to contain something like this:
+
+```
+impl<T> LocalKey<Cell<T>> {
+  fn get_mut(&self) -> &mut T {
+    unsafe {
+      // ... mem::transmute() voodoo goes here ...
+```
+
+The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and
+end up with two `&mut T`s that point to the same thing (mutable aliasing) which
+results in undefined behavior.
+
+## What would help
+
+The ideal solution is for Rust to let you call arbitrary methods
+`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`.  This way you can
+have one (and only one) `&mut T` at any syntactical point in your
+program -- the `&mut self`.
+
+
+[tls-const-init]: https://github.com/rust-lang/rust/pull/90774
+[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594
+[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366
+[fiat-decree]: https://github.com/rust-lang/rust/issues/43038
+[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut
+[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut
+[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366]
+[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods
+[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic
--- a/tvix/docs/src/eval/bindings.md
+++ b/tvix/docs/src/eval/bindings.md
@ -0,0 +1,133 @@
+Compilation of bindings
+=======================
+
+Compilation of Nix bindings is one of the most mind-bending parts of Nix
+evaluation. The implementation of just the compilation is currently almost 1000
+lines of code, excluding the various insane test cases we dreamt up for it.
+
+## What is a binding?
+
+In short, any attribute set or `let`-expression. Tvix currently does not treat
+formals in function parameters (e.g. `{ name ? "fred" }: ...`) the same as these
+bindings.
+
+They have two very difficult features:
+
+1. Keys can mutually refer to each other in `rec` sets or `let`-bindings,
+   including out of definition order.
+2. Attribute sets can be nested, and parts of one attribute set can be defined
+   in multiple separate bindings.
+
+Tvix resolves as much of this logic statically (i.e. at compile-time) as
+possible, but the procedure is quite complicated.
+
+## High-level concept
+
+The idea behind the way we compile bindings is to fully resolve nesting
+statically, and use the usual mechanisms (i.e. recursion/thunking/value
+capturing) for resolving dynamic values.
+
+This is done by compiling bindings in several phases:
+
+1. An initial compilation phase *only* for plain inherit statements (i.e.
+   `inherit name;`), *not* for namespaced inherits (i.e. `inherit (from)
+   name;`).
+
+2. A declaration-only phase, in which we use the compiler's scope tracking logic
+   to calculate the physical runtime stack indices (further referred to as
+   "stack slots" or just "slots") that all values will end up in.
+
+   In this phase, whenever we encounter a nested attribute set, it is merged
+   into a custom data structure that acts like a synthetic AST node.
+
+   This can be imagined similar to a rewrite like this:
+
+   ```nix
+   # initial code:
+   {
+       a.b = 1;
+       a.c = 2;
+   }
+
+   # rewritten form:
+   {
+       a = {
+           b = 1;
+           c = 2;
+       };
+   }
+   ```
+
+   The rewrite applies to attribute sets and `let`-bindings alike.
+
+   At the end of this phase, we know the stack slots of all namespaces for
+   inheriting from, all values inherited from them, and all values (and
+   optionally keys) of bindings at the current level.
+
+   Only statically known keys are actually merged, so any dynamic keys that
+   conflict will lead to a "key already defined" error at runtime.
+
+3. A compilation phase, in which all values (and, when necessary, keys) are
+   actually compiled. In this phase the custom data structure used for merging
+   is encountered when compiling values.
+
+   As this data structure acts like an AST node, the process begins recursively
+   for each nested attribute set.
+
+At the end of this process we have bytecode that leaves the required values (and
+optionally keys) on the stack. In the case of attribute sets, a final operation
+is emitted that constructs the actual attribute set structure at runtime. For
+`let`-bindings a final operation is emitted that removes these locals from the
+stack when the scope ends.
+
+## Moving parts
+
+WARNING: This documents the *current* implementation. If you only care about the
+conceptual aspects, see above.
+
+There's a few types involved:
+
+* `PeekableAttrs`: peekable iterator over an attribute path (e.g. `a.b.c`)
+* `BindingsKind`: enum defining the kind of bindings (attrs/recattrs/let)
+* `AttributeSet`: struct holding the bindings kind, the AST nodes with inherits
+  (both namespaced and not), and an internal representation of bindings
+  (essentially a vector of tuples of the peekable attrs and the expression to
+  compile for the value).
+* `Binding`: enum describing the kind of binding (namespaced inherit, attribute
+  set, plain binding of *any other value type*)
+* `KeySlot`: enum describing the location in which a key slot is placed at
+  runtime (nowhere, statically known value in a slot, dynamic value in a slot)
+* `TrackedBinding`: struct representing statically known information about a
+  single binding (its key slot, value slot and `Binding`)
+* `TrackedBindings`: vector of tracked bindings, which implements logic for
+  merging attribute sets together
+
+And quite a few methods on `Compiler`:
+
+* `compile_bindings`: entry point for compiling anything that looks like a
+  binding, this calls out to the functions below.
+* `compile_plain_inherits`: takes all inherits of a bindings node and compiles
+  the ones that are trivial to compile (i.e. just plain inherits without a
+  namespace). The `rnix` parser does not represent namespaced/plain inherits in
+  different nodes, so this function also aggregates the namespaced inherits and
+  returns them for further use
+* `declare_namespaced_inherits`: passes over all namespaced inherits and
+  declares them on the locals stack, as well as inserts them into the provided
+  `TrackedBindings`
+* `declare_bindings`: declares all regular key/value bindings in a bindings
+  scope, but without actually compiling their keys or values.
+
+  There's a lot of heavy lifting going on here:
+
+  1. It invokes the various pieces of logic responsible for merging nested
+     attribute sets together, creating intermediate data structures in the value
+     slots of bindings that can be recursively processed the same way.
+  2. It decides on the key slots of expressions based on the kind of bindings,
+     and the type of expression providing the key.
+* `bind_values`: runs the actual compilation of values. Notably this function is
+  responsible for recursively compiling merged attribute sets when it encounters
+  a `Binding::Set` (on which it invokes `compile_bindings` itself).
+
+In addition to these several methods (such as `compile_attr_set`,
+`compile_let_in`, ...) invoke the binding-kind specific logic and then call out
+to the functions above.
--- a/tvix/docs/src/eval/build-references.md
+++ b/tvix/docs/src/eval/build-references.md
@ -0,0 +1,254 @@
+Build references in derivations
+===============================
+
+This document describes how build references are calculated in Tvix. Build
+references are used to determine which store paths should be available to a
+builder during the execution of a build (i.e. the full build closure of a
+derivation).
+
+## String contexts in C++ Nix
+
+In C++ Nix, each string value in the evaluator carries an optional so-called
+"string context".
+
+These contexts are themselves a list of strings that take one of the following
+formats:
+
+1. `!<output_name>!<drv_path>`
+
+   This format describes a build reference to a specific output of a derivation.
+
+2. `=<drv_path>`
+
+   This format is used for a special case where a derivation attribute directly
+   refers to a derivation path (e.g. by accessing `.drvPath` on a derivation).
+
+   Note: In C++ Nix this case is quite special and actually requires a
+   store-database query during evaluation.
+
+3. `<path>` - a non-descript store path input, usually a plain source file (e.g.
+   from something like `src = ./.` or `src = ./foo.txt`).
+
+   In the case of `unsafeDiscardOutputDependency` this is used to pass a raw
+   derivation file, but *not* pull in its outputs.
+
+Lets introduce names for these (in the same order) to make them easier to
+reference below:
+
+```rust
+enum BuildReference {
+    /// !<output_name>!<drv_path>
+    SingleOutput(OutputName, DrvPath),
+
+    /// =<drv_path>
+    DrvClosure(DrvPath),
+
+    /// <path>
+    Path(StorePath),
+}
+```
+
+String contexts are, broadly speaking, created whenever a string is the result
+of a computation (e.g. string interpolation) that used a *computed* path or
+derivation in any way.
+
+Note: This explicitly does *not* include simply writing a literal string
+containing a store path (whether valid or not). That is only permitted through
+the `storePath` builtin.
+
+## Derivation inputs
+
+Based on the data above, the fields `inputDrvs` and `inputSrcs` of derivations
+are populated in `builtins.derivationStrict` (the function which
+`builtins.derivation`, which isn't actually a builtin, wraps).
+
+`inputDrvs` is represented by a map of derivation paths to the set of their
+outputs that were referenced by the context.
+
+TODO: What happens if the set is empty? Somebody claimed this means all outputs.
+
+`inputSrcs` is represented by a set of paths.
+
+These are populated by the above references as follows:
+
+* `SingleOutput` entries are merged into `inputDrvs`
+* `Path` entries are inserted into `inputSrcs`
+* `DrvClosure` leads to a special store computation (`computeFSClosure`), which
+  finds all paths referenced by the derivation and then inserts all of them into
+  the fields as above (derivations with _all_ their outputs)
+
+This is then serialised in the derivation and passed down the pipe.
+
+## Builtins interfacing with contexts
+
+C++ Nix has several builtins that interface directly with string contexts:
+
+* `unsafeDiscardStringContext`: throws away a string's string context (if
+  present)
+* `hasContext`: returns `true`/`false` depending on whether the string has
+  context
+* `unsafeDiscardOutputDependency`: drops dependencies on the *outputs* of a
+  `.drv` in the context, passing only the literal `.drv` itself
+
+  Note: This is only used for special test-cases in nixpkgs, and deprecated Nix
+  commands like `nix-push`.
+* `getContext`: returns the string context in serialised form as a Nix attribute
+  set
+* `appendContext`: adds a given string context to the string in the same format
+  as returned by `getContext`
+
+Most of the string manipulation operations will propagate the context to the
+result based on their parameters' contexts.
+
+## Placeholders
+
+C++ Nix has `builtins.placeholder`, which given the name of an output (e.g.
+`out`) creates a hashed string representation of that output name. If that
+string is used anywhere in input attributes, the builder will replace it with
+the actual name of the corresponding output of the current derivation.
+
+C++ Nix does not use contexts for this, it blindly creates a rewrite map of
+these placeholder strings to the names of all outputs, and runs the output
+replacement logic on all environment variables it creates, attribute files it
+passes etc.
+
+## Tvix & string contexts
+
+In the past, Tvix did not track string contexts in its evaluator at all, see
+the historical section for more information about that.
+
+Tvix tracks string contexts in every `NixString` structure via a
+`HashSet<BuildReference>` and offers an API to combine the references while
+keeping the exact internal structure of that data private.
+
+## Historical attempt: Persistent reference tracking
+
+We were investigating implementing a system which allows us to drop string
+contexts in favour of reference scanning derivation attributes.
+
+This means that instead of maintaining and passing around a string context data
+structure in eval, we maintain a data structure of *known paths* from the same
+evaluation elsewhere in Tvix, and scan each derivation attribute against this
+set of known paths when instantiating derivations.
+
+We believed we could take the stance that the system of string contexts as
+implemented in C++ Nix is likely an implementation detail that should not be
+leaking to the language surface as it does now.
+
+### Tracking "known paths"
+
+Every time a Tvix evaluation does something that causes a store interaction, a
+"known path" is created. On the language surface, this is the result of one of:
+
+1. Path literals (e.g. `src = ./.`).
+2. Calls to `builtins.derivationStrict` yielding a derivation and its output
+   paths.
+3. Calls to `builtins.path`.
+
+Whenever one of these occurs, some metadata that persists for the duration of
+one evaluation should be created in Nix. This metadata needs to be available in
+`builtins.derivationStrict`, and should be able to respond to these queries:
+
+1. What is the set of all known paths? (used for e.g. instantiating an
+   Aho-Corasick type string searcher)
+2. What is the _type_ of a path? (derivation path, derivation output, source
+   file)
+3. What are the outputs of a derivation?
+4. What is the derivation of an output?
+
+These queries will need to be asked of the metadata when populating the
+derivation fields.
+
+Note: Depending on how we implement `builtins.placeholder`, it might be useful
+to track created placeholders in this metadata, too.
+
+### Context builtins
+
+Context-reading builtins can be implemented in Tvix by adding `hasContext` and
+`getContext` with the appropriate reference-scanning logic. However, we should
+evaluate how these are used in nixpkgs and whether their uses can be removed.
+
+Context-mutating builtins can be implemented by tracking their effects in the
+value representation of Tvix, however we should consider not doing this at all.
+
+`unsafeDiscardOutputDependency` should probably never be used and we should warn
+or error on it.
+
+`unsafeDiscardStringContext` is often used as a workaround for avoiding IFD in
+inconvenient places (e.g. in the TVL depot pipeline generation). This is
+unnecessary in Tvix. We should evaluate which other uses exist, and act on them
+appropriately.
+
+The initial danger with diverging here is that we might cause derivation hash
+discrepancies between Tvix and C++ Nix, which can make initial comparisons of
+derivations generated by the two systems difficult. If this occurs we need to
+discuss how to approach it, but initially we will implement the mutating
+builtins as no-ops.
+
+### Why this did not work for us?
+
+Nix has a feature to perform environmental checks of your derivation, e.g.
+"these derivation outputs should not be referenced in this derivation", this was
+introduced in Nix 2.2 by
+https://github.com/NixOS/nix/commit/3cd15c5b1f5a8e6de87d5b7e8cc2f1326b420c88.
+
+Unfortunately, this feature introduced a very unfortunate and critical bug: all
+usage of this feature with contextful strings will actually force the
+derivation to depend at least at build time on those specific paths, see
+https://github.com/NixOS/nix/issues/4629.
+
+For example, if you wanted to `disallowedReferences` to a package and you used a
+derivation as a path, you would actually register that derivation as a input
+derivation of that derivation.
+
+This bug is still unfixed in Nix and it seems that fixing it would require
+introducing different ways to evaluate Nix derivations to preserve the
+output path calculation for Nix expressions so far.
+
+All of this would be fine if the bug behavior was uniform in the sense that no
+one tried to force-workaround it. Since Nixpkgs 23.05, due to
+https://github.com/NixOS/nixpkgs/pull/211783 this is not true anymore.
+
+If you let nixpkgs be the disjoint union of bootstrapping derivations $A$ and
+`stdenv.mkDerivation`-built derivations $B$.
+
+$A$ suffers from the bug and $B$ doesn't by the forced usage of
+`unsafeDiscardStringContext` on those special checking fields.
+
+This means that to build hash-compatible $A$ **and** $B$, we need to
+distinguish $A$ and $B$. A lot of hacks could be imagined to support this
+problem.
+
+Let's assume we have a solution to that problem, it means that we are able to
+detect implicitly when a set of specific fields are
+`unsafeDiscardStringContext`-ed.
+
+Thus, we could use that same trick to implement `unsafeDiscardStringContext`
+entirely for all fields actually.
+
+Now, to implement `unsafeDiscardStringContext` in the persistent reference
+tracking model, you will need to store a disallowed list of strings that should
+not trigger a reference when we are scanning a derivation parameters.
+
+But assume you have something like:
+
+```nix
+derivation {
+   buildInputs = [
+     stdenv.cc
+   ];
+
+   disallowedReferences = [ stdenv.cc ];
+}
+```
+
+If you unregister naively the `stdenv.cc` reference, it will silence the fact
+that it is part of the `buildInputs`, so you will observe that Nix will fail
+the derivation during environmental check, but Tvix would silently force remove
+that reference.
+
+Until proven otherwise, it seems highly difficult to have the fine-grained
+information to prevent reference tracking of those specific fields. It is not a
+failure of the persistent reference tracking, it is an unresolved critical bug
+of Nix that only nixpkgs really workarounded for `stdenv.mkDerivation`-based
+derivations.
--- a/tvix/docs/src/eval/builtins.md
+++ b/tvix/docs/src/eval/builtins.md
@ -0,0 +1,138 @@
+Nix builtins
+============
+
+Nix has a lot of built-in functions, some of which are accessible in
+the global scope, and some of which are only accessible through the
+global `builtins` attribute set.
+
+This document is an attempt to track all of these builtins, but
+without documenting their functionality.
+
+See also https://nixos.org/manual/nix/stable/expressions/builtins.html
+
+The `impl` column indicates implementation status in tvix:
+- implemented: "" (empty cell)
+- not yet implemented, but not blocked: `todo`
+- not yet implemented, but blocked by other prerequisites:
+  - `store`: awaiting eval<->store api(s)
+  - `context`: awaiting support for string contexts
+
+| name                          | global | arity | pure  | impl    |
+|-------------------------------|--------|-------|-------|---------|
+| abort                         | true   | 1     |       |         |
+| add                           | false  | 2     | true  |         |
+| addErrorContext               | false  | ?     |       | context |
+| all                           | false  | 2     | true  |         |
+| any                           | false  | 2     | true  |         |
+| appendContext                 | false  | ?     |       |         |
+| attrNames                     | false  | 1     | true  |         |
+| attrValues                    | false  |       | true  |         |
+| baseNameOf                    | true   |       |       |         |
+| bitAnd                        | false  |       |       |         |
+| bitOr                         | false  |       |       |         |
+| bitXor                        | false  |       |       |         |
+| builtins                      | true   |       |       |         |
+| catAttrs                      | false  |       |       |         |
+| compareVersions               | false  |       |       |         |
+| concatLists                   | false  |       |       |         |
+| concatMap                     | false  |       |       |         |
+| concatStringsSep              | false  |       |       |         |
+| currentSystem                 | false  |       |       |         |
+| currentTime                   | false  |       | false |         |
+| deepSeq                       | false  |       |       |         |
+| derivation                    | true   |       |       | store   |
+| derivationStrict              | true   |       |       | store   |
+| dirOf                         | true   |       |       |         |
+| div                           | false  |       |       |         |
+| elem                          | false  |       |       |         |
+| elemAt                        | false  |       |       |         |
+| false                         | true   |       |       |         |
+| fetchGit                      | true   |       |       | store   |
+| fetchMercurial                | true   |       |       | store   |
+| fetchTarball                  | true   |       |       | store   |
+| fetchurl                      | false  |       |       | store   |
+| filter                        | false  |       |       |         |
+| filterSource                  | false  |       |       | store   |
+| findFile                      | false  |       | false | todo    |
+| foldl'                        | false  |       |       |         |
+| fromJSON                      | false  |       |       |         |
+| fromTOML                      | true   |       |       |         |
+| functionArgs                  | false  |       |       |         |
+| genList                       | false  |       |       |         |
+| genericClosure                | false  |       |       | todo    |
+| getAttr                       | false  |       |       |         |
+| getContext                    | false  |       |       |         |
+| getEnv                        | false  |       | false |         |
+| hasAttr                       | false  |       |       |         |
+| hasContext                    | false  |       |       |         |
+| hashFile                      | false  |       | false |         |
+| hashString                    | false  |       |       |         |
+| head                          | false  |       |       |         |
+| import                        | true   |       |       |         |
+| intersectAttrs                | false  |       |       |         |
+| isAttrs                       | false  |       |       |         |
+| isBool                        | false  |       |       |         |
+| isFloat                       | false  |       |       |         |
+| isFunction                    | false  |       |       |         |
+| isInt                         | false  |       |       |         |
+| isList                        | false  |       |       |         |
+| isNull                        | true   |       |       |         |
+| isPath                        | false  |       |       |         |
+| isString                      | false  |       |       |         |
+| langVersion                   | false  |       |       |         |
+| length                        | false  |       |       |         |
+| lessThan                      | false  |       |       |         |
+| listToAttrs                   | false  |       |       |         |
+| map                           | true   |       |       |         |
+| mapAttrs                      | false  |       |       |         |
+| match                         | false  |       |       |         |
+| mul                           | false  |       |       |         |
+| nixPath                       | false  |       |       | todo    |
+| nixVersion                    | false  |       |       | todo    |
+| null                          | true   |       |       |         |
+| parseDrvName                  | false  |       |       |         |
+| partition                     | false  |       |       |         |
+| path                          | false  |       | sometimes | store |
+| pathExists                    | false  |       | false |         |
+| placeholder                   | true   |       |       | context |
+| readDir                       | false  |       | false |         |
+| readFile                      | false  |       | false |         |
+| removeAttrs                   | true   |       |       |         |
+| replaceStrings                | false  |       |       |         |
+| scopedImport                  | true   |       |       |         |
+| seq                           | false  |       |       |         |
+| sort                          | false  |       |       |         |
+| split                         | false  |       |       |         |
+| splitVersion                  | false  |       |       |         |
+| storeDir                      | false  |       |       | store   |
+| storePath                     | false  |       |       | store   |
+| stringLength                  | false  |       |       |         |
+| sub                           | false  |       |       |         |
+| substring                     | false  |       |       |         |
+| tail                          | false  |       |       |         |
+| throw                         | true   |       |       |         |
+| toFile                        | false  |       |       | store   |
+| toJSON                        | false  |       |       |         |
+| toPath                        | false  |       |       |         |
+| toString                      | true   |       |       |         |
+| toXML                         | true   |       |       |         |
+| trace                         | false  |       |       |         |
+| true                          | true   |       |       |         |
+| tryEval                       | false  |       |       |         |
+| typeOf                        | false  |       |       |         |
+| unsafeDiscardOutputDependency | false  |       |       |         |
+| unsafeDiscardStringContext    | false  |       |       |         |
+| unsafeGetAttrPos              | false  |       |       | todo    |
+| valueSize                     | false  |       |       | todo    |
+
+## Added after C++ Nix 2.3 (without Flakes enabled)
+
+| name          | global | arity | pure  | impl  |
+|---------------|--------|-------|-------|-------|
+| break         | false  | 1     |       | todo  |
+| ceil          | false  | 1     | true  |       |
+| fetchTree     | true   | 1     |       | todo  |
+| floor         | false  | 1     | true  |       |
+| groupBy       | false  | 2     | true  |       |
+| traceVerbose  | false  | 2     |       | todo  |
+| zipAttrsWith  | false  | 2     | true  | todo  |
--- a/tvix/docs/src/eval/catchable-errors.md
+++ b/tvix/docs/src/eval/catchable-errors.md
@ -0,0 +1,131 @@
+# (Possible) Implementation(s) of Catchable Errors for `builtins.tryEval`
+
+## Terminology
+
+Talking about “catchable errors” in Nix in general is a bit precarious since
+there is no properly established terminology. Also, the existing terms are less
+than apt. The reason for this lies in the fact that catchable errors (or
+whatever you want to call them) don't properly _exist_ in the language: While
+Nix's `builtins.tryEval` is (originally) based on the C++ exception system,
+it specifically lacks the ability of such systems to have an exception _value_
+whilst handling it. Consequently, these errors don't have an obvious name
+as they never appear _in_ the Nix language. They just have to be named in the
+respective Nix implementation:
+
+- In C++ Nix the only term for such errors is `AssertionError` which is the
+  name of the (C++) exception used in the implementation internally. This
+  term isn't great, though, as `AssertionError`s can not only be generated
+  using `assert`, but also using `throw` and failed `NIX_PATH` resolutions.
+  Were this terminology to be used in documentation addressing Nix language
+  users, it would probably only serve confusion.
+
+- Tvix currently (as of r/7573) uses the term catchable errors. This term
+  relates to nothing in the language as such: Errors are not caught, we rather
+  try to evaluate an expression. Catching also sort of implies that a value
+  representation of the error is attainable (like in an exception system) which
+  is untrue.
+
+In light of this I (sterni) would like to suggest “tryable errors” as an
+alternative term going forward which isn't inaccurate and relates to terms
+already established by language internal naming.
+
+However, this document will continue using the term catchable error until the
+naming is adjusted in Tvix itself.
+
+## Implementation
+
+Below we discuss different implementation approaches in Tvix in order to arrive
+at a proposal for the new one. The historical discussion is intended as a basis
+for discussing the proposal: Are we committing to an old or current mistake? Are
+we solving all problems that cropped up or were solved at any given point in
+time?
+
+### Original
+
+The original implementation of `tryEval` in cl/6924 was quite straightforward:
+It would simply interrupt the propagation of a potential catchable error to the
+top level (which usually happened using the `?` operator) in the builtin and
+construct the appropriate representation of an unsuccessful evaluation if the
+error was deemed catchable. It had, however, multiple problems:
+
+- The VM was originally written without `tryEval` in mind, i.e. it largely
+  assumed that an error would always cause execution to be terminated. This
+  problem was later solved (cl/6940).
+- Thunks could not be `tryEval`-ed multiple times (b/281). This was another
+  consequence of VM architecture at the time: Thunks would be blackholed
+  before evaluation was started and the error could occur. Due to the
+  interaction of the generator-based VM code and `Value::force` the part
+  of the code altering the thunk state would never be informed about the
+  evaluation result in case of a failure, so the thunk would remain
+  blackholed leading to a crash if the same thunk was `tryEval`-ed or
+  forced again. To solve this issue, amjoseph completely overhauled
+  the implementation.
+
+One key point about this implementation is that it is based on the assumption
+that catchable errors can only be generated in thunks, i.e. expressions causing
+them are never evaluated strictly. This can be illustrated using C++ Nix:
+
+```console
+> nix-instantiate --eval -E '[ (assert false; true) (builtins.throw "") <nixpkgs> ]'
+[ <CODE> <CODE> <CODE> ]
+```
+
+If this wasn't the case, the VM could encounter the error in a situation where
+the error would not have needed to pass through the `tryEval` builtin, causing
+evaluation to abort.
+
+### Present
+
+The current system (mostly implemented in cl/9289) uses a very different
+approach: Instead of relying on the thunk boundary, catchable errors are no
+longer errors, but special values. They are created at the relevant points (e.g.
+`builtins.throw`) and propagated whenever they are encountered by VM ops or
+builtins. Finally, they either encounter `builtins.tryEval` (and are converted to
+an ordinary value again) or the top level where they become a normal error again.
+
+The problems with this mostly stem from the confusion between values and errors
+that it necessitates:
+
+- In most circumstances, catchable errors end up being errors again, as `tryEval`
+  is not used a lot. So `throw`s usually end up causing evaluation to abort.
+  Consequently, not only `Value::Catchable` is necessary, but also a corresponding
+  error variant that is _only_ created if a catchable value remains at the end of
+  evaluation. A requirement that was missed until cl/10991 (!) which illustrate
+  how strange that architecture is. A consequence of this is that catchable
+  errors have no location information at all.
+- `Value::Catchable` is similar to other internal values in Tvix, but is much
+  more problematic. Aside from thunks, internal values only exist for a brief
+  amount of time on the stack and it is very clear what parts of the VM or
+  builtins need to handle them. This means that the rest of the implementation
+  need to consider them, keeping the complexity caused by the internal value
+  low. `Value::Catchable`, on the other hand, may exist anywhere and be passed
+  to any VM op or builtin, so it needs to be correctly propagated _everywhere_.
+  This causes a lot of noise in the code as well as a big potential for bugs.
+  Essentially, catchable errors require as much attention by the Tvix developer
+  as laziness. This doesn't really correlate to the importance of the two
+  features to the Nix language.
+
+### Future?
+
+The core assumption of the original solution does offer a path forward: After
+cl/9289 we should be in a better position to introspect an error occurring from
+within the VM code, but we need a better way of storing such an error to prevent
+another b/281. If catchable errors can only be generated in thunks, we can just
+use the thunk representation for this. This would mean that `Thunk::force_`
+would need to check if evaluation was successful and (in case of failure)
+change the thunk representation
+
+- either to the original `ThunkRepr::Suspended` which would be simple, but of
+  course mean duplicated evaluation work in some expressions. In fact, this
+  would probably leave a lot of easy performance on the table for use cases we
+  would like to support, e.g. tree walkers for nixpkgs.
+- or to a new `ThunkRepr` variant that stores the kind of the error and all
+  necessary location info so stack traces can work properly. This of course
+  reintroduces some of the difficulty of having two kinds of errors, but it is
+  hopefully less problematic, as the thunk boundary (i.e. `Thunk::force`) is
+  where errors would usually occur.
+
+Besides the question whether this proposal can actually be implemented, another
+consideration is whether the underlying assumption will hold in the future, i.e.
+can we implement optimizations for thunk elimination in a way that thunks that
+generate catchable errors are never eliminated?
--- a/tvix/docs/src/eval/known-optimisation-potential.md
+++ b/tvix/docs/src/eval/known-optimisation-potential.md
@ -0,0 +1,162 @@
+Known Optimisation Potential
+============================
+
+There are several areas of the Tvix evaluator code base where
+potentially large performance gains can be achieved through
+optimisations that we are already aware of.
+
+The shape of most optimisations is that of moving more work into the
+compiler to simplify the runtime execution of Nix code. This leads, in
+some cases, to drastically higher complexity in both the compiler
+itself and in invariants that need to be guaranteed between the
+runtime and the compiler.
+
+For this reason, and because we lack the infrastructure to adequately
+track their impact (WIP), we have not yet implemented these
+optimisations, but note the most important ones here.
+
+* Use "open upvalues" [hard]
+
+  Right now, Tvix will immediately close over all upvalues that are
+  created and clone them into the `Closure::upvalues` array.
+
+  Instead of doing this, we can statically determine most locals that
+  are closed over *and escape their scope* (similar to how the
+  `compiler::scope::Scope` struct currently tracks whether locals are
+  used at all).
+
+  If we implement the machinery to track this, we can implement some
+  upvalues at runtime by simply sticking stack indices in the upvalue
+  array and only copy the values where we know that they escape.
+
+* Avoid `with` value duplication [easy]
+
+  If a `with` makes use of a local identifier in a scope that can not
+  close before the with (e.g. not across `LambdaCtx` boundaries), we
+  can avoid the allocation of the phantom value and duplication of the
+  `NixAttrs` value on the stack. In this case we simply push the stack
+  index of the known local.
+
+* Multiple attribute selection [medium]
+
+  An instruction could be introduced that avoids repeatedly pushing an
+  attribute set to/from the stack if multiple keys are being selected
+  from it. This occurs, for example, when inheriting from an attribute
+  set or when binding function formals.
+
+* Split closure/function representation [easy]
+
+  Functions have fewer fields that need to be populated at runtime and
+  can directly use the `value::function::Lambda` representation where
+  possible.
+
+* Apply `compiler::optimise_select` to other set operations [medium]
+
+  In addition to selects, statically known attribute resolution could
+  also be used for things like `?` or `with`. The latter might be a
+  little more complicated but is worth investigating.
+
+* Inline fully applied builtins with equivalent operators [medium]
+
+  Some `builtins` have equivalent operators, e.g. `builtins.sub`
+  corresponds to the `-` operator, `builtins.hasAttr` to the `?`
+  operator etc. These operators additionally compile to a primitive
+  VM opcode, so they should be just as cheap (if not cheaper) as
+  a builtin application.
+
+  In case the compiler encounters a fully applied builtin (i.e.
+  no currying is occurring) and the `builtins` global is unshadowed,
+  it could compile the equivalent operator bytecode instead: For
+  example, `builtins.sub 20 22` would be compiled as `20 - 22`.
+  This would ensure that equivalent `builtins` can also benefit
+  from special optimisations we may implement for certain operators
+  (in the absence of currying). E.g. we could optimise access
+  to the `builtins` attribute set which a call to
+  `builtins.getAttr "foo" builtins` should also profit from.
+
+* Avoid nested `VM::run` calls [hard]
+
+  Currently when encountering Nix-native callables (thunks, closures)
+  the VM's run loop will nest and return the value of the nested call
+  frame one level up. This makes the Rust call stack almost mirror the
+  Nix call stack, which is usually undesirable.
+
+  It is possible to detect situations where this is avoidable and
+  instead set up the VM in such a way that it continues and produces
+  the desired result in the same run loop, but this is kind of tricky
+  to get right - especially while other parts are still in flux.
+
+  For details consult the commit with Gerrit change ID
+  `I96828ab6a628136e0bac1bf03555faa4e6b74ece`, in which the initial
+  attempt at doing this was reverted.
+
+* Avoid thunks if only identifier closing is required [medium]
+
+  Some constructs, like `with`, mostly do not change runtime behaviour
+  if thunked. However, they are wrapped in thunks to ensure that
+  deferred identifiers are resolved correctly.
+
+  This can be avoided, as we statically analyse the scope and should
+  be able to tell whether any such logic was required.
+
+* Intern literals [easy]
+
+  Currently, the compiler emits a separate entry in the constant
+  table for each literal.  So the program `1 + 1 + 1` will have
+  three entries in its `Chunk::constants` instead of only one.
+
+* Do some list and attribute set operations in place [hard]
+
+  Algorithms that can not do a lot of work inside `builtins` like `map`,
+  `filter` or `foldl'` usually perform terribly if they use data structures like
+  lists and attribute sets.
+
+  `builtins` can do work in place on a copy of a `Value`, but naïvely expressed
+  recursive algorithms will usually use `//` and `++` to do a single change to a
+  `Value` at a time, requiring a full copy of the data structure each time.
+  It would be a big improvement if we could do some of these operations in place
+  without requiring a new copy.
+
+  There are probably two approaches: We could determine statically if a value is
+  reachable from elsewhere and emit a special in place instruction if not. An
+  easier alternative is probably to rely on reference counting at runtime: If no
+  other reference to a value exists, we can extend the list or update the
+  attribute set in place.
+
+  An **alternative** to this is using [persistent data
+  structures](https://en.wikipedia.org/wiki/Persistent_data_structure) or at the
+  very least [immutable data structures](https://docs.rs/im/latest/im/) that can
+  be copied more efficiently than the stock structures we are using at the
+  moment.
+
+* Skip finalising unfinalised thunks or non-thunks instead of crashing [easy]
+
+  Currently `OpFinalise` crashes the VM if it is called on values that don't
+  need to be finalised. This helps catching miscompilations where `OpFinalise`
+  operates on the wrong `StackIdx`. In the case of function argument patterns,
+  however, this means extra VM stack and instruction overhead for dynamically
+  determining if finalisation is necessary or not. This wouldn't be necessary
+  if `OpFinalise` would just noop on any values that don't need to be finalised
+  (anymore).
+
+* Phantom binding for from expression of inherits [easy]
+
+  The from expression of an inherit is reevaluated for each inherit. This can
+  be demonstrated using the following Nix expression which, counter-intuitively,
+  will print “plonk” twice.
+
+  ```nix
+  let
+    inherit (builtins.trace "plonk" { a = null; b = null; }) a b;
+  in
+  builtins.seq a (builtins.seq b null)
+  ```
+
+  In most Nix code, the from expression is just an identifier, so it is not
+  terribly inefficient, but in some cases a more expensive expression may
+  be used. We should create a phantom binding for the from expression that
+  is reused in the inherits, so only a single thunk is created for the from
+  expression.
+
+  Since we discovered this, C++ Nix has implemented a similar optimization:
+  <https://github.com/NixOS/nix/pull/9847>.
--- a/tvix/docs/src/eval/language-issues.md
+++ b/tvix/docs/src/eval/language-issues.md
@ -0,0 +1,46 @@
+# Nix language issues
+
+In the absence of a language standard, what Nix (the language) is, is prescribed
+by the behavior of the C++ Nix implementation. Still, there are reasons not to
+accept some behavior:
+
+* Tvix aims for nixpkgs compatibility only. This means we can ignore behavior in
+  edge cases nixpkgs doesn't trigger as well as obscure features it doesn't use
+  (e.g. `__overrides`).
+* Some behavior of the Nix evaluator seems to be unintentional or an
+  implementation detail leaking out into language behavior.
+
+Especially in the latter case, it makes sense to raise the respective issue and
+maybe to get rid of the behavior in all implementations for good. Below is an
+(incomplete) list of such issues:
+
+* [Behaviour of nested attribute sets depends on definition order][i7111]
+* [Partially constructed attribute sets are observable during dynamic attr names construction][i7012]
+* [Nix parsers merges multiple attribute set literals for the same key incorrectly depending on definition order][i7115]
+
+On the other hand, there is behavior that seems to violate one's expectation
+about the language at first, but has good enough reasons from an implementor's
+perspective to keep them:
+
+* Dynamic keys are forbidden in `let` and `inherit`. This makes sure that we
+  only need to do runtime identifier lookups for `with`. More dynamic (i.e.
+  runtime) lookups would make the scoping system even more complicated as well
+  as hurt performance.
+* Dynamic attributes of `rec` sets are not added to its scope. This makes sense
+  for the same reason.
+* Dynamic and nested attributes in attribute sets don't get merged. This is a
+  tricky one, but avoids doing runtime (recursive) merges of attribute sets.
+  Instead all necessary merging can be inferred statically, i.e. the C++ Nix
+  implementation already merges at parse time, making nested attribute keys
+  syntactic sugar effectively.
+
+Other behavior is just odd, surprising or underdocumented:
+
+* `builtins.foldl'` doesn't force the initial accumulator (but all other
+  intermediate accumulator values), differing from e.g. Haskell, see
+  the [relevant PR discussion][p7158].
+
+[i7111]: https://github.com/NixOS/nix/issues/7111
+[i7012]: https://github.com/NixOS/nix/issues/7012
+[i7115]: https://github.com/NixOS/nix/issues/7115
+[p7158]: https://github.com/NixOS/nix/pull/7158
--- a/tvix/docs/src/eval/opcodes-attrsets.md
+++ b/tvix/docs/src/eval/opcodes-attrsets.md
@ -0,0 +1,122 @@
+# attrset-opcodes
+
+The problem with attrset literals is twofold:
+
+1. The keys of attribute sets may be dynamically evaluated.
+
+   Access:
+
+   ```nix
+   let
+     k = "foo";
+     attrs = { /* etc. */ };
+   in attrs."${k}"
+   ```
+
+   Literal:
+   ```nix
+   let
+     k = "foo";
+   in {
+     "${k}" = 42;
+   }
+   ```
+
+   The problem with this is that the attribute set key is not known at
+   compile time, and needs to be dynamically evaluated by the VM as an
+   expression.
+
+   For the most part this should be pretty simple, assuming a
+   theoretical instruction set:
+
+   ```
+   0000  OP_CONSTANT(0) # key "foo"
+   0001  OP_CONSTANT(1) # value 42
+   0002  OP_ATTR_SET(1) # construct attrset from 2 stack values
+   ```
+
+   The operation pushing the key needs to be replaced with one that
+   leaves a single value (the key) on the stack, i.e. the code for the
+   expression, e.g.:
+
+   ```
+   0000..000n <operations leaving a string value on the stack>
+   000n+1     OP_CONSTANT(1) # value 42
+   000n+2     OP_ATTR_SET(1) # construct attrset from 2 stack values
+   ```
+
+   This is fairly easy to do by simply recursing in the compiler when
+   the key expression is encountered.
+
+2. The keys of attribute sets may be nested.
+
+   This is the non-trivial part of dealing with attribute set
+   literals. Specifically, the nesting can be arbitrarily deep and the
+   AST does not guarantee that related set keys are located
+   adjacently.
+
+   Furthermore, this frequently occurs in practice in Nix. We need a
+   bytecode representation that makes it possible to construct nested
+   attribute sets at runtime.
+
+   Proposal: AttrPath values
+
+   If we can leave a value representing an attribute path on the
+   stack, we can offload the construction of nested attribute sets to
+   the `OpAttrSet` operation.
+
+   Under the hood, OpAttrSet in practice constructs a `Map<NixString,
+   Value>` attribute set in most cases. This means it expects to pop
+   the value of the key of the stack, but is otherwise free to do
+   whatever it wants with the underlying map.
+
+   In a simple example, we could have code like this:
+
+   ```nix
+   {
+     a.b = 15;
+   }
+   ```
+
+   This would be compiled to a new `OpAttrPath` instruction that
+   constructs and pushes an attribute path from a given number of
+   fragments (which are popped off the stack).
+
+   For example,
+
+   ```
+   0000 OP_CONSTANT(0)  # key "a"
+   0001 OP_CONSTANT(1)  # key "b"
+   0002 OP_ATTR_PATH(2) # construct attrpath from 2 fragments
+   0003 OP_CONSTANT(2)  # value 42
+   0004 OP_ATTRS(1)     # construct attrset from one pair
+   ```
+
+   Right before `0004` the stack would be left like this:
+
+   [ AttrPath[a,b], 42 ]
+
+   Inside of the `OP_ATTRS` instruction we could then begin
+   construction of the map and insert the nested attribute sets as
+   required, as well as validate that there are no duplicate keys.
+
+3. Both of these cases can occur simultaneously, but this is not a
+   problem as the opcodes combine perfectly fine, e.g.:
+
+   ```nix
+   let
+     k = "a";
+   in {
+     "${k}".b = 42;
+   }
+   ```
+
+   results in
+
+   ```
+   0000..000n <operations leaving a string value on the stack>
+   000n+1     OP_CONSTANT(1)  # key "b"
+   000n+2     OP_ATTR_PATH(2) # construct attrpath from 2 fragments
+   000n+3     OP_CONSTANT(2)  # value 42
+   000n+4     OP_ATTR_SET(1)  # construct attrset from 2 stack values
+   ```
--- a/tvix/docs/src/eval/recursive-attrs.md
+++ b/tvix/docs/src/eval/recursive-attrs.md
@ -0,0 +1,68 @@
+Recursive attribute sets
+========================
+
+The construction behaviour of recursive attribute sets is very
+specific, and a bit peculiar.
+
+In essence, there are multiple "phases" of scoping that take place
+during attribute set construction:
+
+1. Every inherited value without an explicit source is inherited only
+   from the **outer** scope in which the attribute set is enclosed.
+
+2. A new scope is opened in which all recursive keys are evaluated.
+   This only considers **statically known keys**, attributes can
+   **not** recurse into dynamic keys in `self`!
+
+   For example, this code is invalid in C++ Nix:
+
+   ```
+   nix-repl> rec { ${"a"+""} = 2; b = a * 10; }
+   error: undefined variable 'a' at (string):1:26
+   ```
+
+3. Finally, a third scope is opened in which dynamic keys are
+   evaluated.
+
+This behaviour, while possibly a bit strange and unexpected, actually
+simplifies the implementation of recursive attribute sets in Tvix as
+well.
+
+Essentially, a recursive attribute set like this:
+
+```nix
+rec {
+  inherit a;
+  b = a * 10;
+  ${"c" + ""} = b * 2;
+}
+```
+
+Can be compiled like the following expression:
+
+```nix
+let
+  inherit a;
+in let
+  b = a * 10;
+  in {
+    inherit a b;
+    ${"c" + ""} = b * 2;
+  }
+```
+
+Completely deferring the resolution of recursive identifiers to the
+existing handling of recursive scopes (i.e. deferred access) in let
+bindings.
+
+In practice, we can further specialise this and compile each scope
+directly into the form expected by `OpAttrs` (that is, leaving
+attribute names on the stack) before each value's position.
+
+C++ Nix's Implementation
+------------------------
+
+* [`ExprAttrs`](https://github.com/NixOS/nix/blob/2097c30b08af19a9b42705fbc07463bea60dfb5b/src/libexpr/nixexpr.hh#L241-L268)
+  (AST representation of attribute sets)
+* [`ExprAttrs::eval`](https://github.com/NixOS/nix/blob/075bf6e5565aff9fba0ea02f3333c82adf4dccee/src/libexpr/eval.cc#L1333-L1414)
+* [`addAttr`](https://github.com/NixOS/nix/blob/master/src/libexpr/parser.y#L98-L156) (`ExprAttrs` construction in the parser)
--- a/tvix/docs/src/eval/vm-loop.md
+++ b/tvix/docs/src/eval/vm-loop.md
@ -0,0 +1,315 @@
+tvix-eval VM loop
+=================
+
+This document describes the new tvix-eval VM execution loop implemented in the
+chain focusing around cl/8104.
+
+## Background
+
+The VM loop implemented in Tvix prior to cl/8104 had several functions:
+
+1. Advancing the instruction pointer for a chunk of Tvix bytecode and
+   executing instructions in a loop until a result was yielded.
+
+2. Tracking Nix call frames as functions/thunks were entered/exited.
+
+3. Catching trampoline requests returned from instructions to force suspended
+   thunks without increasing stack size *where possible*.
+
+4. Handling trampolines through an inner trampoline loop, switching between a
+   code execution mode and execution of subsequent trampolines.
+
+This implementation of the trampoline logic was added on to the existing VM,
+which previously always recursed for thunk forcing. There are some cases (for
+example values that need to be forced *inside* of the execution of a builtin)
+where trampolines could not previously be used, and the VM recursed anyways.
+
+As a result of this trampoline logic being added "on top" of the existing VM
+loop the code became quite difficult to understand. This led to several bugs,
+for example: b/251, b/246, b/245, and b/238.
+
+These bugs were tricky to deal with, as we had to try and make the VM do
+things that are somewhat difficult to fit into its model. We could of course
+keep extending the trampoline logic to accommodate all sorts of concepts (such
+as finalisers), but that seems like it does not solve the root problem.
+
+## New VM loop
+
+In cl/8104, a unified new solution is implemented with which the VM is capable
+of evaluating everything without increasing the call stack size.
+
+This is done by introducing a new frame stack in the VM, on which execution
+frames are enqueued that are either:
+
+1. A bytecode frame, consisting of Tvix bytecode that evaluates compiled Nix
+   code.
+2. A generator frame, consisting of some VM logic implemented in pure Rust
+   code that can be *suspended* when it hits a point where the VM would
+   previously need to recurse.
+
+We do this by making use of the `async` *keyword* in Rust, but notably
+*without* introducing asynchronous I/O or concurrency in tvix-eval (the
+complexity of which is currently undesirable for us).
+
+Specifically, when writing a Rust function that uses the `async` keyword, such
+as:
+
+```rust
+async fn some_builtin(input: Value) -> Result<Value, ErrorKind> {
+  let mut out = NixList::new();
+
+  for element in input.to_list()? {
+    let result = do_something_that_requires_the_vm(element).await;
+    out.push(result);
+  }
+
+  Ok(out)
+}
+```
+
+The compiler actually generates a state-machine under-the-hood which allows
+the execution of that function to be *suspended* whenever it hits an `await`.
+
+We use the [`genawaiter`][] crate that gives us a data structure and simple
+interface for getting instances of these state machines that can be stored in
+a struct (in our case, a *generator frame*).
+
+The execution of the VM then becomes the execution of an *outer loop*, which
+is responsible for selecting the next generator frame to execute, and two
+*inner loops*, which drive the execution of a bytecode frame or generator
+frame forward until it either yields a value or asks to be suspended in favour
+of another frame.
+
+All "communication" between frames happens solely through values left on the
+stack: Whenever a frame of either type runs to completion, it is expected to
+leave a *single* value on the stack. It follows that the whole VM, upon
+completion of the last (or initial, depending on your perspective) frame
+yields its result as the return value.
+
+The core of the VM restructuring is cl/8104, unfortunately one of the largest
+single commit changes we've had to make yet, as it touches pretty much all
+areas of tvix-eval. The introduction of the generators and the
+message/response system we built to request something from the VM, suspend a
+generator, and wait for the return is in cl/8148.
+
+The next sections describe in detail how the three different loops work.
+
+### Outer VM loop
+
+The outer VM loop is responsible for selecting the next frame to run, and
+dispatching it correctly to inner loops, as well as determining when to shut
+down the VM and return the final result.
+
+```
+                          ╭──────────────────╮
+                 ╭────────┤ match frame kind ├──────╮
+                 │        ╰──────────────────╯      │
+                 │                                  │
+    ┏━━━━━━━━━━━━┷━━━━━┓                ╭───────────┴───────────╮
+───►┃ frame_stack.pop()┃                ▼                       ▼
+    ┗━━━━━━━━━━━━━━━━━━┛       ┏━━━━━━━━━━━━━━━━┓      ┏━━━━━━━━━━━━━━━━━┓
+                 ▲             ┃ bytecode frame ┃      ┃ generator frame ┃
+                 │             ┗━━━━━━━━┯━━━━━━━┛      ┗━━━━━━━━┯━━━━━━━━┛
+                 │[yes, cont.]          │                       │
+                 │                      ▼                       ▼
+    ┏━━━━━━━━┓   │             ╔════════════════╗      ╔═════════════════╗
+◄───┨ return ┃   │             ║ inner bytecode ║      ║ inner generator ║
+    ┗━━━━━━━━┛   │             ║      loop      ║      ║      loop       ║
+        ▲        │             ╚════════╤═══════╝      ╚════════╤════════╝
+        │   ╭────┴─────╮                │                       │
+        │   │ has next │                ╰───────────┬───────────╯
+   [no] ╰───┤  frame?  │                            │
+            ╰────┬─────╯                            ▼
+                 │                         ┏━━━━━━━━━━━━━━━━━┓
+                 │                         ┃ frame completed ┃
+                 ╰─────────────────────────┨  or suspended   ┃
+                                           ┗━━━━━━━━━━━━━━━━━┛
+```
+
+Initially, the VM always pops a frame from the frame stack and then inspects
+the type of frame it found. As a consequence the next frame to execute is
+always the frame at the top of the stack, and setting up a VM initially for
+code execution is done by leaving a bytecode frame with the code to execute on
+the stack and passing control to the outer loop.
+
+Control is dispatched to either of the inner loops (depending on the type of
+frame) and the cycle continues once they return.
+
+When an inner loop returns, it has either finished its execution (and left its
+result value on the *value stack*), or its frame has requested to be
+suspended.
+
+Frames request suspension by re-enqueueing *themselves* through VM helper
+methods, and then leaving the frame they want to run *on top* of themselves in
+the frame stack before yielding control back to the outer loop.
+
+The inner control loops inform the outer loops about whether the frame has
+been *completed* or *suspended* by returning a boolean.
+
+### Inner bytecode loop
+
+The inner bytecode loop drives the execution of some Tvix bytecode by
+continously looking at the next instruction to execute, and dispatching to the
+instruction handler.
+
+```
+   ┏━━━━━━━━━━━━━┓
+◄──┨ return true ┃
+   ┗━━━━━━━━━━━━━┛
+          ▲
+     ╔════╧═════╗
+     ║ OpReturn ║
+     ╚══════════╝
+          ▲
+          ╰──┬────────────────────────────╮
+             │                            ▼
+             │                 ╔═════════════════════╗
+    ┏━━━━━━━━┷━━━━━┓           ║ execute instruction ║
+───►┃ inspect next ┃           ╚══════════╤══════════╝
+    ┃  instruction ┃                      │
+    ┗━━━━━━━━━━━━━━┛                      │
+             ▲                      ╭─────┴─────╮
+             ╰──────────────────────┤ suspends? │
+                       [no]         ╰─────┬─────╯
+                                          │
+                                          │
+   ┏━━━━━━━━━━━━━━┓                       │
+◄──┨ return false ┃───────────────────────╯
+   ┗━━━━━━━━━━━━━━┛              [yes]
+```
+
+With this refactoring, the compiler now emits a special `OpReturn` instruction
+at the end of bytecode chunks. This is a signal to the runtime that the chunk
+has completed and that its current value should be returned, without having to
+perform instruction pointer arithmetic.
+
+When `OpReturn` is encountered, the inner bytecode loop returns control to the
+outer loop and informs it (by returning `true`) that the bytecode frame has
+completed.
+
+Any other instruction may also request a suspension of the bytecode frame (for
+example, instructions that need to force a value). In this case the inner loop
+is responsible for setting up the frame stack correctly, and returning `false`
+to inform the outer loop of the suspension
+
+### Inner generator loop
+
+The inner generator loop is responsible for driving the execution of a
+generator frame by continously calling [`Gen::resume`][] until it requests a
+suspension (as a result of which control is returned to the outer loop), or
+until the generator is done and yields a value.
+
+```
+   ┏━━━━━━━━━━━━━┓
+◄──┨ return true ┃ ◄───────────────────╮
+   ┗━━━━━━━━━━━━━┛                     │
+                                       │
+                               [Done]  │
+                    ╭──────────────────┴─────────╮
+                    │ inspect generator response │◄────────────╮
+                    ╰──────────────────┬─────────╯             │
+                            [yielded]  │              ┏━━━━━━━━┷━━━━━━━━┓
+                                       │              ┃ gen.resume(msg) ┃◄──
+                                       ▼              ┗━━━━━━━━━━━━━━━━━┛
+                                 ╭────────────╮                ▲
+                                 │ same-frame │                │
+                                 │  request?  ├────────────────╯
+                                 ╰─────┬──────╯      [yes]
+   ┏━━━━━━━━━━━━━━┓                    │
+◄──┨ return false ┃ ◄──────────────────╯
+   ┗━━━━━━━━━━━━━━┛                [no]
+```
+
+On each execution of a generator frame, `resume_with` is called with a
+[`VMResponse`][] (i.e. a message *from* the VM *to* the generator). For a newly
+created generator, the initial message is just `Empty`.
+
+A generator may then respond by signaling that it has finished execution
+(`Done`), in which case the inner generator loop returns control to the outer
+loop and informs it that this generator is done (by returning `true`).
+
+A generator may also respond by signaling that it needs some data from the VM.
+This is implemented through a request-response pattern, in which the generator
+returns a `Yielded` message containing a [`VMRequest`][]. These requests can be
+very simple ("Tell me the current store path") or more complex ("Call this Nix
+function with these values").
+
+Requests are divided into two classes: Same-frame requests (requests that can be
+responded to *without* returning control to the outer loop, i.e. without
+executing a *different* frame), and multi-frame generator requests. Based on the
+type of request, the inner generator loop will either handle it right away and
+send the response in a new `resume_with` call, or return `false` to the outer
+generator loop after setting up the frame stack.
+
+Most of this logic is implemented in cl/8148.
+
+[`Gen::resume`]: https://docs.rs/genawaiter/0.99.1/genawaiter/rc/struct.Gen.html#method.resume_with
+[`VMRequest`]: https://cs.tvl.fyi/depot@2696839770c1ccb62929ff2575a633c07f5c9593/-/blob/tvix/eval/src/vm/generators.rs?L44
+[`VMResponse`]: https://cs.tvl.fyi/depot@2696839770c1ccb62929ff2575a633c07f5c9593/-/blob/tvix/eval/src/vm/generators.rs?L169
+
+## Advantages & Disadvantages of the approach
+
+This approach has several advantages:
+
+* The execution model is much simpler than before, making it fairly
+  straightforward to build up a mental model of what the VM does.
+
+* All "out of band requests" inside the VM are handled through the same
+  abstraction (generators).
+
+* Implementation is not difficult, albeit a little verbose in some cases (we
+  can argue about whether or not to introduce macros for simplifying it).
+
+* Several parts of the VM execution are now much easier to document,
+  potentially letting us onboard tvix-eval contributors faster.
+
+* The linear VM execution itself is much easier to trace now, with for example
+  the `RuntimeObserver` (and by extension `tvixbolt`) giving much clearer
+  output now.
+
+But it also comes with some disadvantages:
+
+* Even though we "only" use the `async` keyword without a full async-I/O
+  runtime, we still encounter many of the drawbacks of the fragmented Rust
+  async ecosystem.
+
+  The biggest issue with this is that parts of the standard library become
+  unavailable to us, for example the built-in `Vec::sort_by` can no longer be
+  used for sorting in Nix because our comparators themselves are `async`.
+
+  This led us to having to implement some logic on our own, as the design of
+  `async` in Rust even makes it difficult to provide usecase-generic
+  implementations of concepts like sorting.
+
+* We need to allocate quite a few new structures on the heap in order to drive
+  generators, as generators involve storing `Future` types (with unknown
+  sizes) inside of structs.
+
+  In initial testing this seems to make no significant difference in
+  performance (our performance in an actual nixpkgs-eval is still bottlenecked
+  by I/O concerns and reference scanning), but is something to keep in mind
+  later on when we start optimising more after the low-hanging fruits have
+  been reaped.
+
+## Alternatives considered
+
+1. Tacking on more functionality onto the existing VM loop
+   implementation to accomodate problems as they show up. This is not
+   preferred as the code is already getting messy.
+
+2. Making tvix-eval a fully `async` project, pulling in something like Tokio
+   or `async-std` as a runtime. This is not preferred due to the massively
+   increased complexity of those solutions, and all the known issues of fully
+   buying in to the async ecosystem.
+
+   tvix-eval fundamentally should work for use-cases besides building Nix
+   packages (e.g. for `//tvix/serde`), and its profile should be as slim as
+   possible.
+
+3. Convincing the Rust developers that Rust needs a way to guarantee
+   constant-stack-depth tail calls through something like a `tailcall`
+   keyword.
+
+4. ... ?
+
+[`genawaiter`]: https://docs.rs/genawaiter/