refactor(ci): Consolidate Rust checks and add reusable build actions #934

Merged
tcpipuk merged 2 commits from tom/rust-checks into main 2025-08-28 19:25:04 +00:00
Owner

Merge rust-checks.yml into prek-checks.yml to create a unified CI workflow that runs formatting and clippy/tests sequentially, avoiding duplicate setup overhead while dramatically improving performance through aggressive caching.

Key improvements:

  • Add setup-rust action with sccache and timelord for >50% faster builds
  • Add setup-llvm-with-apt action with proper caching and GPG key management
  • Support for cross-compilation (dpkg-arch) and custom targets in preparation for future ARM64 and other platform builds
  • Smart caching strategy: toolchain, sccache, and build artifacts cached separately
  • Cleaner logs with collapsible output groups for verbose operations
  • Fix liburing dependency for io_uring feature compilation

The new actions are intentionally comprehensive to support upcoming cross-compilation workflows while immediately benefiting from build caching that reduces CI time from >10 minutes to <5 minutes for incremental changes.

Merge rust-checks.yml into prek-checks.yml to create a unified CI workflow that runs formatting and clippy/tests sequentially, avoiding duplicate setup overhead while dramatically improving performance through aggressive caching. Key improvements: - Add setup-rust action with sccache and timelord for >50% faster builds - Add setup-llvm-with-apt action with proper caching and GPG key management - Support for cross-compilation (dpkg-arch) and custom targets in preparation for future ARM64 and other platform builds - Smart caching strategy: toolchain, sccache, and build artifacts cached separately - Cleaner logs with collapsible output groups for verbose operations - Fix liburing dependency for io_uring feature compilation The new actions are intentionally comprehensive to support upcoming cross-compilation workflows while immediately benefiting from build caching that reduces CI time from >10 minutes to <5 minutes for incremental changes.
tcpipuk self-assigned this 2025-08-17 20:52:16 +00:00
refactor(ci): Consolidate Rust checks and add reusable build actions
Some checks failed
Release Docker Image / merge (push) Blocked by required conditions
Release Docker Image / define-variables (push) Successful in 22s
Checks / Prek / Pre-commit & Formatting (push) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (push) Has been cancelled
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Has been cancelled
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Has been cancelled
Documentation / Build and Deploy Documentation (pull_request) Successful in 46s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 42s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 5m24s
10bb995759
Merge rust-checks.yml into prek-checks.yml to create a unified CI workflow
that runs formatting and clippy/tests sequentially, avoiding duplicate setup
overhead while dramatically improving performance through aggressive caching.

Key improvements:
- Add setup-rust action with sccache and timelord for >50% faster builds
- Add setup-llvm-with-apt action with proper caching and GPG key management
- Support for cross-compilation (dpkg-arch) and custom targets in preparation
  for future ARM64 and other platform builds
- Smart caching strategy: toolchain, sccache, and build artifacts cached separately
- Cleaner logs with collapsible output groups for verbose operations
- Fix liburing dependency for io_uring feature compilation

The new actions are intentionally comprehensive to support upcoming cross-
compilation workflows while immediately benefiting from build caching that
reduces CI time from >10 minutes to <5 minutes for incremental changes.
tcpipuk force-pushed tom/rust-checks from 10bb995759 to 0373dd8382 2025-08-17 20:54:02 +00:00 Compare
tcpipuk force-pushed tom/rust-checks from 0373dd8382 to 067ee77963 2025-08-17 20:57:02 +00:00 Compare
Jade left a comment
Owner

So just a few (fixable) comments here. One thing to come back to is that the additional packages in the LLVM action won't be cached, and the caching in that action might be a bit fragile (not easy to solve, the action I was using in the prior workflow was more generic but just as caveat-filled). It's something that could be worth looking into with the idea of base images.

So just a few (fixable) comments here. One thing to come back to is that the additional packages in the LLVM action won't be cached, and the caching in that action might be a bit fragile (not easy to solve, the action I was using in the prior workflow was more generic but just as caveat-filled). It's something that could be worth looking into with the idea of base images.
@ -0,0 +17,4 @@
minimum-version:
description: 'Minimum LLVM version required'
required: false
default: '20'
Owner

We want an exact version rather than a minimum version - the purpose is to closely match the llvm version we get from

NU⠀> rustc -vV
rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: aarch64-apple-darwin
release: 1.87.0
LLVM version: 20.1.1

because the llvm interfaces used for cross-language LTO aren't stable and can be updated in backwards/forwards incompatible ways.

We want an exact version rather than a minimum version - the purpose is to closely match the llvm version we get from ``` NU⠀> rustc -vV rustc 1.87.0 (17067e9ac 2025-05-09) binary: rustc commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359 commit-date: 2025-05-09 host: aarch64-apple-darwin release: 1.87.0 LLVM version: 20.1.1 ``` because the llvm interfaces used for cross-language LTO aren't stable and can be updated in backwards/forwards incompatible ways.
Author
Owner

I'm specifically defaulting to the stable version, which is currently 20 (which is why I've set it at as a minimum) but am happy to request a specific version instead 🙂

I'm specifically defaulting to the stable version, which is currently 20 (which is why I've set it at as a minimum) but am happy to request a specific version instead 🙂
tcpipuk marked this conversation as resolved
@ -0,0 +44,4 @@
sudo dpkg --add-architecture ${{ inputs.dpkg-arch }}
# Restrict default sources to amd64
sudo sed -i 's/^deb http/deb [arch=amd64] http/g' /etc/apt/sources.list
Owner

Why are we restricting to AMD64 here? (Both at all, and instead of the host arch)

Why are we restricting to AMD64 here? (Both at all, and instead of the host arch)
Author
Owner

In the cross-compile update, I'll be adding build environments using the dpkg arch, but this is a simplified version to make the PR easier to split into two chunks.

In the cross-compile update, I'll be adding build environments using the dpkg arch, but this is a simplified version to make the PR easier to split into two chunks.
tcpipuk marked this conversation as resolved
@ -0,0 +49,4 @@
# Add ports sources for foreign architecture
sudo tee /etc/apt/sources.list.d/${{ inputs.dpkg-arch }}.list > /dev/null <<EOF
deb [arch=${{ inputs.dpkg-arch }}] http://ports.ubuntu.com/ubuntu-ports/ jammy main restricted universe multiverse
Owner

Does debian not have the arches we need? Is using ubuntu packages OK here? (ie are they going to be built against a different glibc version, if so is that OK and does that affect the final binary we produce?) And you're pinning 22.04 here - how do we make sure this is updated?

Does debian not have the arches we need? Is using ubuntu packages OK here? (ie are they going to be built against a different glibc version, if so is that OK and does that affect the final binary we produce?) And you're pinning 22.04 here - how do we make sure this is updated?
Owner

(fwiw this is mostly important from the perspective of builds, it doesn't really matter just for tests)

(fwiw this is mostly important from the perspective of builds, it doesn't really matter just for tests)
Author
Owner

The runner is ubuntu-latest, so installing Debian packages over it doesn't really make sense. This is part of adding build environments from Ubuntu into the OS for cross-compile purposes.

I can rummage around the Ubuntu distro name to re-add the same version, but I assume we'll probably want to manually test/investigate that at the point that runners finally get updated?

The runner is `ubuntu-latest`, so installing Debian packages over it doesn't really make sense. This is part of adding build environments from Ubuntu into the OS for cross-compile purposes. I can rummage around the Ubuntu distro name to re-add the same version, but I assume we'll probably want to manually test/investigate that at the point that runners finally get updated?
Owner

Ah yeah, I forgot that the runners are debian. That does kinda poke a hole in building release binaries on the host, at least until we get musl builds working, though. Not going to count it as a blocker because we still have the docker images, but something to remember

Ah yeah, I forgot that the runners are debian. That does kinda poke a hole in building release binaries on the host, at least until we get musl builds working, though. Not going to count it as a blocker because we still have the docker images, but something to remember
Owner

But yeah, manual upgrade is fair enough ,but that means that we should be pinning the runner version (either via a tag or by specifying an image)

But yeah, manual upgrade is fair enough ,but that means that we should be pinning the runner version (either via a tag or by specifying an image)
Author
Owner

I'm not sure why Ubuntu means we can't compile for musl? I imagine we'd just use the musl-tools package or something?

I'm not sure why Ubuntu means we can't compile for musl? I imagine we'd just use the `musl-tools` package or something?
Owner

Musl is the solution - the issue is that Ubuntu won't be using an old version of glibc, and people will need to run a distro where the glibc is <= the builder's glibc - so 'stable' distros like debian/RHEL/Rocky/Alma/whatever won't be able to run the binary

Musl is the solution - the issue is that Ubuntu won't be using an old version of glibc, and people will need to run a distro where the glibc is <= the builder's glibc - so 'stable' distros like debian/RHEL/Rocky/Alma/whatever won't be able to run the binary
Author
Owner

I'm using clang for builds, so potentially we can target any library version we want? I've got no problem with an extra musl build, it's just quite a large performance price to pay if we were to use it for all builds.

I'm using clang for builds, so potentially we can target any library version we want? I've got no problem with an extra musl build, it's just quite a large performance price to pay if we were to use it for all builds.
Owner

Yeah, let's leave it for now, this only matters for release builds distributed on their own. Unfortunately, it's not as easy as just switching out the glibc version, because you have to rebuild all the system libraries from source.

Yeah, let's leave it for now, this only matters for release builds distributed on their own. Unfortunately, it's not as easy as just switching out the glibc version, because you have to rebuild all the system libraries from source.
tcpipuk marked this conversation as resolved
@ -0,0 +70,4 @@
/usr/lib/x86_64-linux-gnu/libclang*.so*
/etc/apt/sources.list.d/archive_uri-*
/etc/apt/trusted.gpg.d/apt.llvm.org.asc
key: llvm-${{ steps.os-version.outputs.os-string }}-v${{ inputs.minimum-version }}-v2-${{ hashFiles('**/Cargo.lock', 'rust-toolchain.toml') }}
Owner

Splitting this by the lockfile or toolchain doesn't make sense here - it's not going to change when we update a rust dependency.

Splitting this by the lockfile or toolchain doesn't make sense here - it's not going to change when we update a rust dependency.
Author
Owner

No, but it helps ensure it gets rebuilt regularly - I looked at putting a hard date limit in there, and do have a cronjob to delete caches 30 days after they've been created, but this seemed like a reasonable way to force a rebuild on a semi-regular basis.

I can have it just rebuild when I delete cache, I'm not really fussed?

No, but it helps ensure it gets rebuilt regularly - I looked at putting a hard date limit in there, and do have a cronjob to delete caches 30 days after they've been created, but this seemed like a reasonable way to force a rebuild on a semi-regular basis. I can have it just rebuild when I delete cache, I'm not really fussed?
Owner

I guess that kind of makes sense? Although we don't update either on a regular basis - lockfile can go unchanged for a month and then change four times in two days, although renovate might change that to be more regular - so it's probably better to be explicit? Or just rely on cache expiry

I guess that kind of makes sense? Although we don't update either on a regular basis - lockfile can go unchanged for a month and then change four times in two days, although renovate might change that to be more regular - so it's probably better to be explicit? Or just rely on cache expiry
Author
Owner

I'd personally say it might be safer to watch and adjust based on what we see, but I don't mind forcing a fixed refresh interval if you'd prefer it on a schedule - either works for me!

In most of these cases, the difference between cached and uncached is under a minute (each, mind) so a regular rebuild would not be painful, it's just about ensuring a rebuild definitely does happen before too long but that the average build happens quickly.

I'd personally say it might be safer to watch and adjust based on what we see, but I don't mind forcing a fixed refresh interval if you'd prefer it on a schedule - either works for me! In most of these cases, the difference between cached and uncached is under a minute (each, mind) so a regular rebuild would not be painful, it's just about ensuring a rebuild definitely does happen before too long but that the average build happens quickly.
tcpipuk marked this conversation as resolved
@ -0,0 +128,4 @@
shell: bash
run: |
echo "::group::🔧 Installing LLVM (minimum version: ${{ inputs.minimum-version }})"
bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
Owner

Yeah, again, pinned, not minimum, please! :)

Yeah, again, pinned, not minimum, please! :)
Owner
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    curl https://apt.llvm.org/llvm.sh > llvm.sh && \
    chmod +x llvm.sh && \
    ./llvm.sh ${LLVM_VERSION} && \
    rm llvm.sh

So you can basically just pass the version to the install script to get the version you need

```Dockerfile RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ --mount=type=cache,target=/var/lib/apt,sharing=locked \ curl https://apt.llvm.org/llvm.sh > llvm.sh && \ chmod +x llvm.sh && \ ./llvm.sh ${LLVM_VERSION} && \ rm llvm.sh ``` So you can basically just pass the version to the install script to get the version you need
Author
Owner

I know how to do it, just thought you'd probably want the current stable version. I'm happy to update it.

I know how to do it, just thought you'd probably want the current stable version. I'm happy to update it.
Owner

Yeah, we want a specific version for the reason I mentioned in the other comment (and in a small comment in the dokerfile)

Yeah, we want a specific version for the reason I mentioned in the other comment (and in a small comment in the dokerfile)
tcpipuk marked this conversation as resolved
@ -0,0 +51,4 @@
uses: https://github.com/actions/cache@v4
with:
path: |
~/.cache/sccache
Owner

Rather than downloading the whole sccache cache, is using the direct GHA cache integration faster? That will only download what is needed as it is needed.

Rather than downloading the whole sccache cache, is using the direct GHA cache integration faster? That will only download what is needed as it is needed.
Author
Owner

In my experience, it adds more latency than it's worth. Have you experienced otherwise?

In my experience, it adds more latency than it's worth. Have you experienced otherwise?
Owner

Fair enough. I wouldn't have expected latency to be the issue when the cache is local to the runner, though!

Fair enough. I wouldn't have expected latency to be the issue when the cache is local to the runner, though!
Author
Owner

I've managed to accumulate over 320GB in actcache over the last few days, and the jobs run in a separate container to the runner then talk over TCP, so it depends a bit on how you measure... I can certainly try it the other way and see how big of a hit it is?

I've managed to accumulate over 320GB in actcache over the last few days, and the jobs run in a separate container to the runner then talk over TCP, so it depends a bit on how you measure... I can certainly try it the other way and see how big of a hit it is?
Owner

Something to come back to, maybe. Latency from firefox to an image running in Docker in a VM is ~3ms, and host to host is in μs, but I could see other overheads causing a slowdown

Something to come back to, maybe. Latency from firefox to an image running in Docker in a VM is ~3ms, and host to host is in μs, but I could see other overheads causing a slowdown
tcpipuk marked this conversation as resolved
@ -0,0 +52,4 @@
with:
path: |
~/.cache/sccache
/timelord/
Owner

Also just vibes, but feels like these cache items shouldn't be combined.

Also just vibes, but feels like these cache items shouldn't be combined.
Author
Owner

I had them separate, but the timelord cache is so small that it didn't really feel efficient to have on its own - both caches are re-saved every time any job runs, so I think it's (marginally by a fraction of a second) faster this way.

It's up to you, I don't mind splitting them out, it was just a slight efficiency saving and a few lines of YAML.

I had them separate, but the timelord cache is so small that it didn't really feel efficient to have on its own - both caches are re-saved every time any job runs, so I think it's (marginally by a fraction of a second) faster this way. It's up to you, I don't mind splitting them out, it was just a slight efficiency saving and a few lines of YAML.
Owner

Fair enough, I'll take that. Can always be changed later.

Fair enough, I'll take that. Can always be changed later.
tcpipuk marked this conversation as resolved
@ -0,0 +115,4 @@
echo "✅ sccache already available"
else
echo "📦 Installing sccache"
cargo install sccache --locked
Owner

Cargo install will always install from source here, which isn't fast? Does this get avoided somehow, or cached?

Cargo install will always install from source here, which isn't fast? Does this get avoided somehow, or cached?
Owner

I see - ~/.cargo/bin is being cached. That still leaves an unusually slow run on cache miss - cargo install could be used here, and it'd also be cached in the same location. And at that point, there'd be three things, which feels like a pattern that should be either localised or made less verbose 😅

I see - `~/.cargo/bin` is being cached. That still leaves an unusually slow run on cache miss - cargo install could be used here, and it'd also be cached in the same location. And at that point, there'd be three things, which feels like a pattern that should be either localised or made less verbose 😅
Author
Owner

It doesn't install if the binary is already available after being restored from cache, so technically it won't be rebuilt until the tool cache is deleted.

It doesn't install if the binary is already available after being restored from cache, so technically it won't be rebuilt until the tool cache is deleted.
Owner

Yeah, I saw that part - any comment on using binstall?

Yeah, I saw that part - any comment on using binstall?
Author
Owner

Yes, I couldn't get it to work for timelord-cli without a GitHub token, and part of the purpose of this design is to move away from needing to authenticate with GitHub just to run a fork.

The actions are often GHA, so I'd like to fix that, but again, it's incremental updates - if we can divide the CI duration today, the next step would likely involve a dedicated build image, and we can squash out any remaining external GHA repos around the same time.

Yes, I couldn't get it to work for timelord-cli without a GitHub token, and part of the purpose of this design is to move away from needing to authenticate with GitHub just to run a fork. The actions are often GHA, so I'd like to fix that, but again, it's incremental updates - if we can divide the CI duration today, the next step would likely involve a dedicated build image, and we can squash out any remaining external GHA repos around the same time.
Owner

Ah. It's grabbing timelord from quickinstall, which is a GitHub releases collection - probably hitting rate limits. It should fall back to just building from source, though - if it doesn't, that's a bug.

Ah. It's grabbing timelord from quickinstall, which is a GitHub releases collection - probably hitting rate limits. It *should* fall back to just building from source, though - if it doesn't, that's a bug.
Owner

For a later iteration, maybe

For a later iteration, maybe
Author
Owner

Yeah, it does it after a 3 minute retry cycle. It seemed faster to cache the binary and not build if we already have it, than to build binstall and potentially end up building from source (after an extraordinarily long timeout) anyway.

Most of these issues will be entirely mitigated when we have a build image, this is just a good way to build quickly until that day comes.

Yeah, it does it after a 3 minute retry cycle. It seemed faster to cache the binary and not build if we already have it, than to build binstall and potentially end up building from source (after an extraordinarily long timeout) anyway. Most of these issues will be entirely mitigated when we have a build image, this is just a good way to build quickly until that day comes.
Owner

The solution to that is using --maximum-resolution-timeout to move on after ~5s or --disable-strategies to always skip it entirely

The solution to that is using `--maximum-resolution-timeout` to move on after ~5s or `--disable-strategies` to always skip it entirely
tcpipuk marked this conversation as resolved
@ -0,0 +121,4 @@
- name: Configure sccache
shell: bash
run: |
echo "RUSTC_WRAPPER=sccache" >> $GITHUB_ENV
Owner

https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/.forgejo/actions/sccache/action.yml#L25-L29

We at least use the C and CXX ones, that's just a block of everything it supports so might as well have it

https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/.forgejo/actions/sccache/action.yml#L25-L29 We at least use the C and CXX ones, that's just a block of everything it supports so might as well have it
tcpipuk marked this conversation as resolved
@ -0,0 +146,4 @@
echo "✅ timelord already available"
else
echo "📦 Installing timelord"
cargo install timelord-cli --locked
Owner

Same comment as the sccache install

Same comment as the sccache install
Author
Owner

Same reply as the sccache install 🙂

Same reply as the sccache install 🙂
tcpipuk marked this conversation as resolved
Author
Owner

I agree base images would make the whole thing faster, but this is more of an incremental move - my expectation was that we'd halve (or smaller) build time today, then continue to optimise over time.

I agree base images would make the whole thing faster, but this is more of an incremental move - my expectation was that we'd halve (or smaller) build time today, then continue to optimise over time.
Owner

Yeah, that's why I'm saying it's something to come back to!

Yeah, that's why I'm saying it's something to come back to!
tcpipuk force-pushed tom/rust-checks from 067ee77963 to f8ef29c6ad 2025-08-18 19:49:45 +00:00 Compare
tcpipuk force-pushed tom/rust-checks from f8ef29c6ad to 803043539c 2025-08-18 19:53:49 +00:00 Compare
tcpipuk force-pushed tom/rust-checks from 803043539c to df78dab696 2025-08-18 20:15:06 +00:00 Compare
tcpipuk force-pushed tom/rust-checks from df78dab696 to adccd46c85 2025-08-18 20:25:43 +00:00 Compare
Author
Owner

Had a head-scratcher there trying to figure out why sccache was failing here when it worked on my fork, but it's because my instance automatically resolves mozilla-actions/sccache-action@v0.0.9 to the full GitHub URL, whereas yours (understandably) follows the Forgejo default.

The additional packages issue in the LLVM action is handled - I'm using awalsh128/cache-apt-pkgs-action again, which seems to do the job as long as you make sure to apt update before it runs 🙂

We've now got the sccache action properly working (using a very restricted GH key) plus cargo-binstall for prek and timelord, and seeing empty-cache runs complete in <8 mins and normal builds in 2-3 minutes.

I've got a bit more testing to do on the cross-compilation builds and Docker images, but will aim to get that done this week, so can then PR my update to the main build (faster, extra haswell tag, extra debug tag, etc) and then I might finally get around to complement 😆

Had a head-scratcher there trying to figure out why sccache was failing here when it worked on my fork, but it's because my instance automatically resolves `mozilla-actions/sccache-action@v0.0.9` to the full GitHub URL, whereas yours (understandably) follows the Forgejo default. The additional packages issue in the LLVM action is handled - I'm using `awalsh128/cache-apt-pkgs-action` again, which seems to do the job as long as you make sure to `apt update` before it runs 🙂 We've now got the sccache action properly working (using a very restricted GH key) plus cargo-binstall for prek and timelord, and seeing empty-cache runs complete in <8 mins and normal builds in 2-3 minutes. I've got a bit more testing to do on the cross-compilation builds and Docker images, but will aim to get that done this week, so can then PR my update to the main build (faster, extra haswell tag, extra debug tag, etc) and then I might finally get around to complement 😆
requested review from Jade 2025-08-28 18:15:45 +00:00
tcpipuk force-pushed tom/rust-checks from adccd46c85 to dd22325ea2 2025-08-28 18:21:22 +00:00 Compare
chore: Add reasons for test skips
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Successful in 58s
Release Docker Image / define-variables (push) Successful in 4s
Documentation / Build and Deploy Documentation (push) Successful in 55s
Checks / Prek / Pre-commit & Formatting (push) Successful in 1m23s
Checks / Prek / Clippy and Cargo Tests (push) Failing after 4m50s
Release Docker Image / merge (push) Has been cancelled
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 14m28s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Has been cancelled
37248a4f68
Jade scheduled this pull request to auto merge when all checks succeed 2025-08-28 19:11:23 +00:00
Jade approved these changes 2025-08-28 19:11:30 +00:00
tcpipuk scheduled this pull request to auto merge when all checks succeed 2025-08-28 19:12:38 +00:00
tcpipuk deleted branch tom/rust-checks 2025-08-28 19:25:05 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: continuwuation/continuwuity#934
No description provided.