Switch out the string-scanning algorithm used in the reference scanner.
The construction of aho-corasick automata made up the vast majority of
runtime when evaluating nixpkgs previously. While the actual scanning
with a constructed automaton is relatively fast, we almost never scan
for the same set of strings twice and the cost is not worth it.
An algorithm that better matches our needs is the Wu-Manber multiple
string match algorithm, which works efficiently on *long* and *random*
strings of the *same length*, which describes store paths (up to their
hash component).
This switches the refscanner crate to a Rust implementation[0][1] of
this algorithm.
This has several implications:
1. This crate does not provide a way to scan streams. I'm not sure if
this is an inherent problem with the algorithm (probably not, but
it would need buffering). Either way, related functions and
tests (which were actually unused) have been removed.
2. All strings need to be of the same length. For this reason, we
truncate the known paths after their hash part (they are still
unique, of course).
3. Passing an empty set of matches, or a match that is shorter than
the length of a store path, causes the crate to panic. We safeguard
against this by completely skipping the refscanning if there are no
known paths (i.e. when evaluating the first derivation of an eval),
and by bailing out of scanning a string that is shorter than a
store path.
On the upside, this reduces overall runtime to less 1/5 of what it was
before when evaluating `pkgs.stdenv.drvPath`.
[0]: Frankly, it's a random, research-grade MIT-licensed
crate that I found on Github:
https://github.com/jneem/wu-manber
[1]: We probably want to rewrite or at least fork the above crate, and
add things like a three-byte wide scanner. Evaluating large
portions of nixpkgs can easily lead to more than 65k derivations
being scanned for.
Change-Id: I08926778e1e5d5a87fc9ac26e0437aed8bbd9eb0
Reviewed-on: https://cl.tvl.fyi/c/depot/+/8017
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
|
||
|---|---|---|
| .gcroots | ||
| .nixery | ||
| corp | ||
| docs | ||
| fun | ||
| lisp | ||
| net | ||
| nix | ||
| ops | ||
| third_party | ||
| tools | ||
| tvix | ||
| users | ||
| views | ||
| web | ||
| .envrc | ||
| .git-blame-ignore-revs | ||
| .gitignore | ||
| .hgignore | ||
| .mailmap | ||
| .rgignore | ||
| buf.gen.yaml | ||
| buf.yaml | ||
| default.nix | ||
| LICENSE | ||
| OWNERS | ||
| README.md | ||
| RULES | ||
| rustfmt.toml | ||
depot
This repository is the monorepo for the community around The Virus Lounge, containing our personal tools and infrastructure. Everything in here is built using Nix.
A large portion of the software here is very self-referential, meaning that it exists to sustain the operation of the repository. This is the case because we partially see this as an experiment in tooling for monorepos.
Highlights
Services
-
Source code is available primarily through Sourcegraph on cs.tvl.fyi, where it is searchable and even semantically indexed. A lower-tech view of the repository is also available via cgit-pink on code.tvl.fyi.
The repository can be cloned using
gitfromhttps://cl.tvl.fyi/depot. -
All code in the depot, with the exception of code that is checked in to individual
//usersfolders, needs to be reviewed. We use Gerrit on cl.tvl.fyi for this. -
Issues are tracked via our own issue tracker on b.tvl.fyi. Its source code lives at
//web/panettone/. -
Smaller todo-list entries which do not warrant a separate issue are listed at todo.tvl.fyi.
-
We use Buildkite for CI. Recent builds are listed on tvl.fyi/builds and pipelines are configured dynamically via
//ops/pipelines. -
A search service that makes TVL services available via textual shortcuts is available: atward
All services that we host are deployed on NixOS machines that we manage. Their
configuration is tracked in //ops/{modules,machines}.
Nix
//nix/readTreecontains the Nix code which automatically registers projects in our Nix attribute hierarchy based on their in-tree location//tools/nixerycontains the source code of Nixery, a container registry that can build images ad-hoc from Nix packages//nix/yantscontains Yet Another Nix Type System, which we use for a variety of things throughout the repository//nix/buildGoimplements a Nix library that can build Go software in the style of Bazel'srules_go. Go programs in this repository are built using this library.//nix/buildLispimplements a Nix library that can build Common Lisp software. Currently only SBCL is supported. Lisp programs in this repository are built using this library.//web/blogand//web/atom-feed: A Nix-based static site generator which generates the web page and Atom feed for tazj.in (//users/tazjin/homepage) and tvl.fyi (//web/tvl)//web/bubblegumcontains a CGI-based web framework written in Nix.//nix/nint: A shebang-compatible interpreter wrapper for Nix.//tvixcontains initial work towards a modular architecture for Nix.
We have a variety of other tools and libraries in the //nix folder which may
be of interest.
Packages / Libraries
//net/alcoholic_jwtcontains an easy-to-use JWT-validation library for Rust//net/crimpcontains a high-level HTTP client using cURL for Rust//tools/emacs-pkgscontains various useful Emacs libraries, for example:dottime.elprovides dottime in the Emacs modelinenix-util.elprovides editing utilities for Nix filesterm-switcher.elis an ivy-function for switching between vterm bufferstvl.elprovides helper functions for interacting with the TVL monorepo
//lisp/klatreprovides a grab-bag utility library for Common Lisp
User packages
Contributors to the repository have user directories under
//users, which can be used for
personal or experimental code that does not require review.
Some examples:
//users/grfn/xanthous: A (WIP) TUI RPG, written in Haskell.//users/tazjin/emacs: tazjin's Emacs & EXWM configuration//users/tazjin/finito: A persistent finite-state machine library for Rust.
Licensing
Unless otherwise stated in a subdirectory, all code is licensed under the MIT license. See LICENSE for details.
Contributing
If you'd like to contribute to any of the tools in here, please check out the contribution guidelines and our code of conduct.
IRC users can find us in #tvl on hackint, which is also
reachable via XMPP at #tvl@irc.hackint.org (sic!).
Hackint also provide a web chat.