This Development-cycle in Cargo: 1.84
This is a summary of what has been happening around Cargo development for the last 6 weeks which is approximately the merge window for Rust 1.84.
Plugin of the cycle
Cargo can't be everything to everyone, if for no other reason than the compatibility guarantees it must uphold. Plugins play an important part of the Cargo ecosystem and we want to celebrate them.
Our plugin for this cycle is cargo-hack makes it easy to verify different feature combinations work together and that you can build for all supported Rust versions.
Thanks to epage for the suggestion!
Please submit your suggestions for the next post.
Implementation
Simple english in documentation
trot approached the Cargo team on zulip about making the Cargo book more approachable for people for whom English is a second language. After some discussion, we decided to start with simplifying the language used in the Cargo book.
KGrewal1 took the lead on this and posted #14825. They also made the language more consistent in #14829.
Build Script API
Update from 1.83
With the Cargo team approving owning build-rs
,
epage
worked with
CAD97
and
pietroalbini
to transfer publish rights for build-rs to the Rust Project.
CAD97 then did a first-pass review and update to build-rs
and epage merging it into cargo (#14786).
epage then did a pass to update build-rs
in #14817.
On zulip,
Turbo87 raised concernd about build-rs
(and Cargo-maintained crates more generally) being in the Cargo repo and tied to the Cargo release process.
This means that there is a 6-12 week delay between a bug fix being merged and bein released, projects that need access to unstable functionality must use a git dependency, the MSRV is infectious which puts pressure on the Cargo team to bump it regularly, and the issues are mixed together.
On the other hand, Cargo support, documentation, and APIs are able to developed hand-in-hand.
It would be great if we could improve the release process within the Cargo repo (e.g. #14538)
but keeping that in sync with 3 parallel release channels (stable, beta, nightly), including leaving space to patch an arbitrary number of crate releases for each release channel,
makes this difficult.
Replacing mtimes with checksums
Update from 1.83
With unstable support for using checksums, rather than mtime, to determine when a build is dirty, the Cargo team discussed the path for stabilization.
One hole in the current design is that Cargo doesn't checksum inputs to build scripts. If Cargo did the checksum on the user's behalf, then it might see a different version of the file than the build script. However, requiring build scripts to checksum the files and report that to Cargo adds a significant complexity to build scripts, including coordinating with Cargo on what hash algorithm is used. This would also require build scripts to pull in a hasher dependency, increasing their build times. However, it is unclear if checksumming for build scripts is a requirement. Also, if we could develop a plan to reduce the need for build scripts, we reduce the scope of the problem.
Another concern is with performance. The overhead of checksumming will be most noticeable on builds without any changes as otherwise compile times will dominate. We are further tracking this in #14722.
There is the question of what this will look like. Do we exclusively switch to checksums or give people the choice? At minimum, we'd need to give people a temporary escape hatch as we transition Cargo to checksums in case they are somehow relying on the mtime behavior. Whether Cargo would need a permanent config field is unclear.
We reached out to some build system owners on zulip to do a call-for-testing. So far, we've only heard from the Rust Project itself where this made build time testing more difficult because touching files to force rebuilds is a much easier option than trying to carefully make repeated edits to files.
Rustflags and caching
Cargo's build fingerprinting has to satisfy several needs, including
- Detecting when the existing build cache has to be thrown away and re-built, called the fingerprint
- Segregating the build cache so it gets preserved across unrelated commands (e.g. alternating
cargo check
andcargo test
without rebuilding dependencies), called-Cextra-filename
- Making symbol names unique so you can't unintentionally use a type in an ABI-incompatible context, called
-Cmetadata
RUSTFLAGS
is a way to bypass Cargo's abstractions and directly control the behavior of rustc
.
Cargo includes RUSTFLAGS
in the fingerprint hash but not in the -Cextra-filename
hash,
causing a full rebuild when they change.
This can be especially problematic when RUSTFLAGS
differs between the user and their editor running cargo
.
For example, some users report they set --cfg test
in their editor so all #[cfg(test)]
s are enabled in rust-analyzer.
A previous attempt was made to segregate the cache for RUSTFLAGS
in
#6503.
However, Cargo uses the same hash for -Cextra-filename
and -Cmetadata
so by segregating the cache, the symbol names also become unique.
In theory, this is a good thing as RUSTFLAGS
can affect the ABI.
However, not all RUSTFLAGS
affect the ABI.
Take --remap-path-prefix
which is suppose to make builds of binaries more reproducible by stripping information specific to a specific build.
By including this in -Cmetadata
,
the binary changes (#6914).
A special case for this was added #6966.
Another case we ran into was PGO.
With PGO, you create a build with
-Cprofile-generate
and then run it against a benchmark.
You then feed this back into the build with
-Cprofile-use
to improve the optimizations the compiler performs.
At this point, we reverted
#6503
in
#7417.
In #8716,
ehuss proposed Cargo track -Cextra-filename
and -Cmetadata
separately and only include RUSTFLAGS
in -Cextra-filename
.
After some refactoring
(#14826)
and test improvements
(
#14848,
#14846,
#14859
)
by epage and weihanglo,
epage posted #14830.
However, weihanglo found there are still problems with --remap-path-prefix
: even when using profile.dev.split-debuginfo="packed"
, the binaries are different because the binary includes DW_AT_GNU_dwo_name
which points to the debug file which exists per-rlib with -Cextra-filename
included.
Merging of #14830 is blocked until the problem with --remap-path-prefix
is resolved.
Snapshot testing
Update from 1.82
epage finalized the work for moving off of Cargo's custom assertions.
In removing the core of the custom assertions,
we were relying on dead_code
warnings as we removed assertions
that were no longer used.
However, we missed removing an assertion and epage removed it in #14759.
#14781 and
#14785 saw us migrate the last of our "unordered lines" assertion tests.
#14785 took some investigation to figure out the best way to migrate.
Cargo's custom assertions redacted fewer values and allowed a test author to ignore a value redaction by using the raw value in the expected result.
snapbox
applies redactions earlier in the process, requiring them to always be used.
This made it so Cargo would lose test coverage in switching to snapbox as we wouldn't be verifying as much of cargo
s output.
However, in consulting with the test author, coverage of those redacted values was not intended by them, bypassing this problem for now.
This still left "contains", "does not contain", and "contains x but not y" assertions.
Rather than trying to design out how these should fit into snapbox,
epage left them after switching to snapbox
's redactions in #14790.
As this point, epage documented lessons learned through this effort in #14793 and we now consider this migration complete, closing out #14039.
JSON schema files
In #12883,
we got a request for JSON schema support for .cargo/config.toml
files.
We already have to duplicate the schema between the source and the documentation, we didn't want to duplicate it with a hand-maintained JSON schema representation.
Thankfully there is schemars to generate JSON schema from serde types.
To experiment with JSON schema generation,
dacianpascu06 added support for JSON schema generation for Cargo.toml
in
#14683,
see manifest.schema.json.
Generating a JSON schema for .cargo/config.toml
will take a bit more investigation.
Cargo.toml
has a single top-level definition with specific extension points within the schema.
.cargo/config.toml
does not have a single top-level definition but instead the schema is defined per table or field.
This is because config layering operates on the specific path that is looked up.
The types for the schema are scattered throughout the Cargo code base and will take work to collect them all together to create a top-level definition solely for JSON schema generation.
Design discussions
Improving the built-in profiles
Hand-in-hand with benchmarking is profiling yet the bench
profile does not include the relevant debug information for profiling,
requiring users to tweak their profiles in every repo (or in their home directory).
CraftSpider proposed in
#14032
that we update the bench
profile to make this easier.
However, benchmarks also need fidelity with release
builds to ensure your numbers match what users will see.
We decided we should keep the bench
profile matching release
though we recognize there is room to explore improving user workflows for profiling.
foxtran restarted the conversation on changing the defaults for release
to improve runtime performance in
#11298.
Potential changes include
- Enabling LTO, whether thin or fat
- Reducing the codegen-units
- Increasing the opt-level
While release
builds aren't focused on fast compile-times,
there is still a point of diminishing returns in trading off compile-time for runtime performance.
While release
is generally geared towards production builds,
there are cases where dev
is too slow for development.
weihanglo ran the numbers of LTO and codegen-units for Cargo in
#14719.
From those numbers, it seems like thin LTO is an easy win.
One option is for a release-fast
or release-heavy
to be made.
Adding new profiles may be a breaking change though and we'd have to carefully approach doing so.
We also already have discoverability problems with release
and it has a dedicated flag (--release
).
Without some kind of built-in integration, it seems like these policies would be best left for users.
Whatever profile is used, one problem with LTO is that there are miscompilations which might prevent it from being a safe default (e.g. #115344).
On the other end of the spectrum is the dev
profile.
This profile serves two roles
- Fast iteration time
- Running code through a debugger
It turns out that these can be at odds with each other. When running through a debugger, you often want the binary to behave like the source code and optimizations can get in the way. However, optimizations can reduce the amount of IR being processed, speeding up codegen. They can then also speed up proc macros, build scripts, and test runs. Maybe we can even design a optimization level focused on improving compile times at the cost of the debugger experience. Similarly, how much debug information you carry around in your binary can affect your build times.
Looking at the Rust 2023 survey results, improving compilation times and the debugging experience is neck and neck. The question is which debugging experience are they referring to? Those on the call mostly used "printf"-style debugging and would get benefit out of improving compilation time. Even if we surveyed people and found this representative of the Rust community (which davidlattimore did for a subset of the community on Fediverse), how much of this is survivorship bias from the quality of the debugger experience? How much would even existing community members behavior change with an improved debugger experience?
However, this may not be all-or-nothing.
We could split the dev
profile into separate iteration-time and debugger profiles so there is a low friction way of access the non-default workflow.
There would still be friction.
If iteration-time were the default and enough people use debuggers through their IDEs and those IDEs are pre-configured,
then working with IDE vendors to change their defaults would reduce a lot of the friction.
This would likely require a long transition period.
We could split one of the two workflows out into a whole new profile which runs into the same problems as release-fast
and release-heavy
.
One idea for address the potential breakage is that we move the built-in profiles into a cargo::
namespace and make them immutable.
We would switch the reserved profiles to just inheriting a namespaced profile by default.
There are open questions on whether this would be a breaking change and more analysis would be needed.
Instead of reserving a new profile name, what if Cargo used the reserved debug
name?
debug
is already a reserved profile name and in several user-facing locations the dev
profile is referred to as debug
(--debug
, target/debug
).
We could make dev
(--dev
) focused on iteration time and debug
(--debug
) on debuggers.
There is the question of target/debug
as changing users to target/dev
might be too disruptive.
It will take work to finish a plan and figuring out if its too disruptive. If can move forward with it, it will likely require a long transition time and support across multiple projects.
Is this change worth it?
joshtriplett ran a survey on
Internals
on the affect of just
CARGO_PROFILE_DEV_DEBUG=line-tables-only
on compilation time
with some follow up conversation on zulip.
Another angle for improving iteration time for dev
is to make it easier to speed up dependencies in the hot path.
Cargo allows you to set different optimization levels for different dependencies and some projects encourage this, like sqlx:
[profile.dev.package.sqlx-macros]
opt-level = 3
What if packages could provide a package override for when they are used as a dependency?
Another potential use case for dependency-specified profile overrides is for mir-only rlibs.
Cargo performs codegen for each rlib in your dependency tree and relies on the linker to remove anything unused.
Mir-only rlibs would defer all codegen to the very end, allowing less codegen to be performed, potentially speeding up builds.
This has the potential to replace the need for [features]
for a large number of use cases.
One problem is if there is a lot of shared mir between test binaries as that will lead to redundant codegen, slowing down builds.
One way to experiment with this is to allow enabling mir-only rlibs on a per-package basis through profiles.
With dependency-specified profile overrides, large packages like
windows-sys
could opt-in to being a mir-only rlib.
Dependency-specified profile overrides would be a hidden interaction that would need careful consideration.
Avoid building production code when changing tests
milianw posted on zulip about their library and all dependents rebuild when changing a unit test.
When a #[test]
inside of a library changes,
the timestamp for the file changes and Cargo rebuilds the file.
One way to avoid that is by
moving tests to dedicated files.
The rust repo does this with a tool to enforce the practice.
epage proposed a clippy lint for this in rust-clippy#13589.
When a library changes, Cargo always rebuilds dependents. Previously, Osiewicz proposed on zulip that rustc hash the API of a crate, allowing Cargo to only rebuild dependents when the API hash changes. This is being tracked in #14604.
Misc
- Daily reports by Eh2406 on the progress of the Rust implementation of the PugGrub version solving algorithm
- Building on epage's work in #14750, linyihai diagnostics with extraneous details in #14497.
- Rustin170506 updated how config files are loaded for cargo script in #14749
- epage updated frontmatter parsing for cargo script in #14792 and got manifest-editing commands updated to support cargo script in #14857 and #14864
- arlosi wrapped up work on
CARGO_BUILD_WARNINGS=deny
in #14388 (update from 1.81)
Focus areas without progress
These are areas of interest for Cargo team members with no reportable progress for this development-cycle.
Ready-to-develop:
- Config control over feature unification
- Open namespaces
- Split CARGO_TARGET_DIR
- Auto-generate completions
Needs design and/or experimentation:
Planning:
- Disabling of default features
- RFC #3416:
features
metadata- RFC #3487: visibility (visibility)
- RFC #3486: deprecation
- Unstable features
- OS-native config/cache directories (ie XDG support)
- Pre-RFC: Global, mutually exclusive features
- RFC #3553: Cargo SBOM Fragment
How you can help
If you have ideas for improving cargo, we recommend first checking our backlog and then exploring the idea on Internals.
If there is a particular issue that you are wanting resolved that wasn't discussed here, some steps you can take to help move it along include:
- Summarizing the existing conversation (example:
Better support for docker layer caching,
Change in
Cargo.lock
policy, MSRV-aware resolver ) - Document prior art from other ecosystems so we can build on the work others have done and make something familiar to users, where it makes sense
- Document related problems and solutions within Cargo so we see if we are solving to the right layer of abstraction
- Building on those posts, propose a solution that takes into account the above information and cargo's compatibility requirements (example)
We are available to help mentor people for S-accepted issues on zulip and you can talk to us in real-time during Contributor Office Hours. If you are looking to help with one of the bigger projects mentioned here and are just starting out, fixing some issues will help familiarize yourself with the process and expectations, making things go more smoothly. If you'd like to tackle something without a mentor, the expectations will be higher on what you'll need to do on your own.