fix(ops/pipelines): Chunk build pipeline into multiple uploads

The number of jobs in the depot pipeline is reaching the limits of the
Buildkite backend's ability for a single pipeline upload. Based on a
conversation with their support my understanding is that this has to
do with internal locking mechanisms at Buildkite.

To work around this, we can instead chunk the pipeline into several
smaller chunks that are uploaded serially.

This commit introduces logic to chunk the pipeline accordingly. The
chunk size chosen is 256 for now (a multiple of our number of agents,
which is useful if we can get builds from the first chunk to start
before the next ones are uploaded).

Note that this chunk size is significantly below even the current
number of targets (~460 as of this commit), but choosing a lower chunk
size might alleviate problems we've been seeing with timeouts during
pipeline uploads.

Change-Id: I77030aaf8b874c330218b78c77d15216e13b9af7
Reviewed-on: https://cl.tvl.fyi/c/depot/+/4332
Tested-by: BuildkiteCI
Reviewed-by: wpcarro <wpcarro@gmail.com>
Autosubmit: tazjin <mail@tazj.in>
This commit is contained in:
Vincent Ambo 2021-12-15 14:28:15 +03:00 committed by clbot
parent 13f7bf06bb
commit 38ec27e834
2 changed files with 51 additions and 10 deletions

View file

@ -8,8 +8,13 @@ steps:
- label: ":llama:"
command: |
set -ue
nix-build -A ops.pipelines.depot -o depot.yaml --show-trace && \
buildkite-agent pipeline upload depot.yaml
nix-build -A ops.pipelines.depot -o pipeline --show-trace
# Steps need to be uploaded in reverse order because pipeline
# upload prepends instead of appending.
ls pipeline/chunk-*.json | tac | while read chunk; do
buildkite-agent pipeline upload $$chunk
done
# Wait for all previous steps to complete.
- wait: null