refactor(ci): Consolidate Rust checks and add reusable build actions #934
No reviewers
Labels
No labels
Bug
Cherry-picking
Database
Dependencies
Dependencies/Renovate
Difficulty
Easy
Difficulty
Hard
Difficulty
Medium
Documentation
Enhancement
Good first issue
Help wanted
Inherited
Matrix/Administration
Matrix/Appservices
Matrix/Auth
Matrix/Client
Matrix/Core
Matrix/Federation
Matrix/MSC
Matrix/Media
Meta
Meta/Packaging
Priority
Blocking
Priority
High
Priority
Low
Security
Status
Confirmed
Status
Duplicate
Status
Invalid
Status
Needs Investigation
To-Merge
Wont fix
old/ci/cd
old/rust
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: continuwuation/continuwuity#934
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "tom/rust-checks"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Merge rust-checks.yml into prek-checks.yml to create a unified CI workflow that runs formatting and clippy/tests sequentially, avoiding duplicate setup overhead while dramatically improving performance through aggressive caching.
Key improvements:
The new actions are intentionally comprehensive to support upcoming cross-compilation workflows while immediately benefiting from build caching that reduces CI time from >10 minutes to <5 minutes for incremental changes.
10bb995759
to0373dd8382
0373dd8382
to067ee77963
So just a few (fixable) comments here. One thing to come back to is that the additional packages in the LLVM action won't be cached, and the caching in that action might be a bit fragile (not easy to solve, the action I was using in the prior workflow was more generic but just as caveat-filled). It's something that could be worth looking into with the idea of base images.
@ -0,0 +17,4 @@
minimum-version:
description: 'Minimum LLVM version required'
required: false
default: '20'
We want an exact version rather than a minimum version - the purpose is to closely match the llvm version we get from
because the llvm interfaces used for cross-language LTO aren't stable and can be updated in backwards/forwards incompatible ways.
I'm specifically defaulting to the stable version, which is currently 20 (which is why I've set it at as a minimum) but am happy to request a specific version instead 🙂
@ -0,0 +44,4 @@
sudo dpkg --add-architecture ${{ inputs.dpkg-arch }}
# Restrict default sources to amd64
sudo sed -i 's/^deb http/deb [arch=amd64] http/g' /etc/apt/sources.list
Why are we restricting to AMD64 here? (Both at all, and instead of the host arch)
In the cross-compile update, I'll be adding build environments using the dpkg arch, but this is a simplified version to make the PR easier to split into two chunks.
@ -0,0 +49,4 @@
# Add ports sources for foreign architecture
sudo tee /etc/apt/sources.list.d/${{ inputs.dpkg-arch }}.list > /dev/null <<EOF
deb [arch=${{ inputs.dpkg-arch }}] http://ports.ubuntu.com/ubuntu-ports/ jammy main restricted universe multiverse
Does debian not have the arches we need? Is using ubuntu packages OK here? (ie are they going to be built against a different glibc version, if so is that OK and does that affect the final binary we produce?) And you're pinning 22.04 here - how do we make sure this is updated?
(fwiw this is mostly important from the perspective of builds, it doesn't really matter just for tests)
The runner is
ubuntu-latest
, so installing Debian packages over it doesn't really make sense. This is part of adding build environments from Ubuntu into the OS for cross-compile purposes.I can rummage around the Ubuntu distro name to re-add the same version, but I assume we'll probably want to manually test/investigate that at the point that runners finally get updated?
Ah yeah, I forgot that the runners are debian. That does kinda poke a hole in building release binaries on the host, at least until we get musl builds working, though. Not going to count it as a blocker because we still have the docker images, but something to remember
But yeah, manual upgrade is fair enough ,but that means that we should be pinning the runner version (either via a tag or by specifying an image)
I'm not sure why Ubuntu means we can't compile for musl? I imagine we'd just use the
musl-tools
package or something?Musl is the solution - the issue is that Ubuntu won't be using an old version of glibc, and people will need to run a distro where the glibc is <= the builder's glibc - so 'stable' distros like debian/RHEL/Rocky/Alma/whatever won't be able to run the binary
I'm using clang for builds, so potentially we can target any library version we want? I've got no problem with an extra musl build, it's just quite a large performance price to pay if we were to use it for all builds.
Yeah, let's leave it for now, this only matters for release builds distributed on their own. Unfortunately, it's not as easy as just switching out the glibc version, because you have to rebuild all the system libraries from source.
@ -0,0 +70,4 @@
/usr/lib/x86_64-linux-gnu/libclang*.so*
/etc/apt/sources.list.d/archive_uri-*
/etc/apt/trusted.gpg.d/apt.llvm.org.asc
key: llvm-${{ steps.os-version.outputs.os-string }}-v${{ inputs.minimum-version }}-v2-${{ hashFiles('**/Cargo.lock', 'rust-toolchain.toml') }}
Splitting this by the lockfile or toolchain doesn't make sense here - it's not going to change when we update a rust dependency.
No, but it helps ensure it gets rebuilt regularly - I looked at putting a hard date limit in there, and do have a cronjob to delete caches 30 days after they've been created, but this seemed like a reasonable way to force a rebuild on a semi-regular basis.
I can have it just rebuild when I delete cache, I'm not really fussed?
I guess that kind of makes sense? Although we don't update either on a regular basis - lockfile can go unchanged for a month and then change four times in two days, although renovate might change that to be more regular - so it's probably better to be explicit? Or just rely on cache expiry
I'd personally say it might be safer to watch and adjust based on what we see, but I don't mind forcing a fixed refresh interval if you'd prefer it on a schedule - either works for me!
In most of these cases, the difference between cached and uncached is under a minute (each, mind) so a regular rebuild would not be painful, it's just about ensuring a rebuild definitely does happen before too long but that the average build happens quickly.
@ -0,0 +128,4 @@
shell: bash
run: |
echo "::group::🔧 Installing LLVM (minimum version: ${{ inputs.minimum-version }})"
bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
Yeah, again, pinned, not minimum, please! :)
So you can basically just pass the version to the install script to get the version you need
I know how to do it, just thought you'd probably want the current stable version. I'm happy to update it.
Yeah, we want a specific version for the reason I mentioned in the other comment (and in a small comment in the dokerfile)
@ -0,0 +51,4 @@
uses: https://github.com/actions/cache@v4
with:
path: |
~/.cache/sccache
Rather than downloading the whole sccache cache, is using the direct GHA cache integration faster? That will only download what is needed as it is needed.
In my experience, it adds more latency than it's worth. Have you experienced otherwise?
Fair enough. I wouldn't have expected latency to be the issue when the cache is local to the runner, though!
I've managed to accumulate over 320GB in actcache over the last few days, and the jobs run in a separate container to the runner then talk over TCP, so it depends a bit on how you measure... I can certainly try it the other way and see how big of a hit it is?
Something to come back to, maybe. Latency from firefox to an image running in Docker in a VM is ~3ms, and host to host is in μs, but I could see other overheads causing a slowdown
@ -0,0 +52,4 @@
with:
path: |
~/.cache/sccache
/timelord/
Also just vibes, but feels like these cache items shouldn't be combined.
I had them separate, but the timelord cache is so small that it didn't really feel efficient to have on its own - both caches are re-saved every time any job runs, so I think it's (marginally by a fraction of a second) faster this way.
It's up to you, I don't mind splitting them out, it was just a slight efficiency saving and a few lines of YAML.
Fair enough, I'll take that. Can always be changed later.
@ -0,0 +115,4 @@
echo "✅ sccache already available"
else
echo "📦 Installing sccache"
cargo install sccache --locked
Cargo install will always install from source here, which isn't fast? Does this get avoided somehow, or cached?
I see -
~/.cargo/bin
is being cached. That still leaves an unusually slow run on cache miss - cargo install could be used here, and it'd also be cached in the same location. And at that point, there'd be three things, which feels like a pattern that should be either localised or made less verbose 😅It doesn't install if the binary is already available after being restored from cache, so technically it won't be rebuilt until the tool cache is deleted.
Yeah, I saw that part - any comment on using binstall?
Yes, I couldn't get it to work for timelord-cli without a GitHub token, and part of the purpose of this design is to move away from needing to authenticate with GitHub just to run a fork.
The actions are often GHA, so I'd like to fix that, but again, it's incremental updates - if we can divide the CI duration today, the next step would likely involve a dedicated build image, and we can squash out any remaining external GHA repos around the same time.
Ah. It's grabbing timelord from quickinstall, which is a GitHub releases collection - probably hitting rate limits. It should fall back to just building from source, though - if it doesn't, that's a bug.
For a later iteration, maybe
Yeah, it does it after a 3 minute retry cycle. It seemed faster to cache the binary and not build if we already have it, than to build binstall and potentially end up building from source (after an extraordinarily long timeout) anyway.
Most of these issues will be entirely mitigated when we have a build image, this is just a good way to build quickly until that day comes.
The solution to that is using
--maximum-resolution-timeout
to move on after ~5s or--disable-strategies
to always skip it entirely@ -0,0 +121,4 @@
- name: Configure sccache
shell: bash
run: |
echo "RUSTC_WRAPPER=sccache" >> $GITHUB_ENV
https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/.forgejo/actions/sccache/action.yml#L25-L29
We at least use the C and CXX ones, that's just a block of everything it supports so might as well have it
@ -0,0 +146,4 @@
echo "✅ timelord already available"
else
echo "📦 Installing timelord"
cargo install timelord-cli --locked
Same comment as the sccache install
Same reply as the sccache install 🙂
I agree base images would make the whole thing faster, but this is more of an incremental move - my expectation was that we'd halve (or smaller) build time today, then continue to optimise over time.
Yeah, that's why I'm saying it's something to come back to!
067ee77963
tof8ef29c6ad
f8ef29c6ad
to803043539c
803043539c
todf78dab696
df78dab696
toadccd46c85
Had a head-scratcher there trying to figure out why sccache was failing here when it worked on my fork, but it's because my instance automatically resolves
mozilla-actions/sccache-action@v0.0.9
to the full GitHub URL, whereas yours (understandably) follows the Forgejo default.The additional packages issue in the LLVM action is handled - I'm using
awalsh128/cache-apt-pkgs-action
again, which seems to do the job as long as you make sure toapt update
before it runs 🙂We've now got the sccache action properly working (using a very restricted GH key) plus cargo-binstall for prek and timelord, and seeing empty-cache runs complete in <8 mins and normal builds in 2-3 minutes.
I've got a bit more testing to do on the cross-compilation builds and Docker images, but will aim to get that done this week, so can then PR my update to the main build (faster, extra haswell tag, extra debug tag, etc) and then I might finally get around to complement 😆
adccd46c85
todd22325ea2