The `justbuild` project

The justbuild generic build tool is the result of me being asked in 2020 by my then employer to design a build system from scratch and lead the development done by a small team. The project was open-sourced (Apache-2.0 license) in late 2022 with the first stable release in December of that same year. I was leading the project at the technical level till 1.6.1 in mid 20205.

My main considerations during the design and development were the following.

In a setup (like the one in a company) where many developers work on the same project while having common resources (in the same trust realm), remote execution will be dominant form of building, as otherwise the same code will be built over and over again (once per developer per change, instead only a single time per change). So let's make local builds (which every build system has to support) mimic remote execution: for every step, create a fresh directory, hard link in the inputs, run the command, hard link out the outputs to content-addressable store (CAS), and dispose of the action directory.
This model has the advantage that artifacts are just entries in a CAS. There is no path attached to it. This has several benefits, even if never using remote execution.
- When defining an action (i.e., a build step), we can freely chose where the inputs should occur. Shorter paths and command lines, enough space for toolchains etc, if needed.
- It is not a problem, if different actions have outputs with the same name. No overlapping-outputs check, no "output must be in the same module" to make the overlap checks feasible. Also simpler output names, as we not have to worry about making them unique.
- A particular consequence of the just mentioned flexibility is that there is no need for path mangling when a repository is pulled in as external dependency of another repository. The actions defined by a self-contained repository are always the same, no matter if it is considered the main repository, pulled in under the name foo or pulled in under the name bar.
- No path mangling for configuration transitions either.
Without things being tied to paths let's consider actions by how they are defined, rather than by where they are defined. This is not only the more natural notion of equality (at least for a mathematical logician), but also allows us to be more relaxed, e.g., when it comes to configuration transitions: if a certain part of the code base does not depend on the variables of a transition, it will not add additional actions to the action graph (remember that we don't do path mangling for transitions).
For targets as well, they're not bound to a place of definition; hence we can use any identifier for them. This gives a nice solution for things like protobuf, where we want to define dependency structure of proto-files without having to know for which programming languages users later will need language-specific generated code: Our proto rules define abstract graph nodes (with uninterpreted strings as rule names) and language-specific code (that knows which rules can build a proto library for that language) can depend on an anonymous target named by the pair of an abstract graph node and a rule binding for abstract rule names. Now, if different targets use the same proto for the same language they actually depend on the same target (named by the same pair).
Since I mentioned multi repositories: we have to support them (some companies have more repositories than others and in the open-source world it is natural to separate repositories at least for separate upstream). However, agreeing on global naming is impossible, as experience has shown. So, let every repository chose how it would like to call its dependencies and we stitch things together by binding these open names in a top-level graph. An immediate advantage is that we can update from one version of a library to another one just by changing the binding in the top level graph, without renaming labels everywhere within the repository using that library—even if our project has to use several versions of that library simultaneously.
This explicit binding also gives us a better overview of the code structure (at repository level). In particular, we know all repositories a given repository transitively depends upon. As we refrain from repository-name-based path mangling, the value of a target only depends on itself, the supported configuration parameters, and its dependencies—but not the consumer (as it should be!). So if the transitively reachable repositories have not changed, the target value is still the same. Hence we can cache at that level. Moreover, the analysis of a target might depend on things being equal (ensuring there is no staging conflict), but not on things being different; so we actually cache the extensional projection, i.e., the result of the build.
- A neat side effect is that in this way, at least for subsequent builds, we keep the target-graph small. Hence we can afford to redo the analysis at every invocation; no daemons in the background.
- These repositories are logical ones; a single git repository can still be split into many such logical repositories. The tree identifier of a subdirectory identifies the content and can still be obtained quickly.
Now, if we use remote execution then we download our dependencies just to compute and walk the action graph of the dependency a single time (because then things end up in target-level cache). So let's make target-level caching a service. That way, we don't even have to download the dependencies at all; can just ask that service (using a small key, as repositories are essentially described by their tree identifiers) to give us an output description with references to the artifacts that are in the CAS of the remote execution.

Other considerations I took into account.

People like to bikeshed about names; so let's make the names of the target-description file, the rules-description file, etc, configurable—of course on a per-repository basis.
Since we only care about how things are defined, but not where, the source files and target descriptions (as well as the rules and the expressions) can come from different source roots.
Target descriptions should be declarative. In particular, we want to read off the definition of a particular target by a simple parsing of the file without file-global evaluation (without the need for output-conflict checks, there is also no need for the tool to look at other targets defined in the same file). Also use a syntax that does not look like a programming language. (In the end, I chose JSON because every programming language can read and write it.) To avoid magic symbols and magic names, use structured target references and allow for the non-path parts (i.e., everything except the module name which is a path relative to the repository root) arbitrary strings.
If people really need to make computations to target descriptions (or rule definitions, or any other root), then make sure that is cached properly. Computed roots also allows people to bring their own syntax for everything, as long as they can write a parser for they favorite syntax (again, any programming language can write JSON). For example, projects with a strict source-code layout can restrict their build description to short dependency hints and compute the build description from essentially the directory structure.
People voluntarily use git. This gives us a quick way to obtain a suitable identifier of an artifact or tree, without actually having to read the object. So let's use as default protocol remote-execution based on git identifiers; however, also support the plain remote-execution protocol using a hash that was there from the beginning.

I maintain a local source mirror that I update (fast-forward only) regularly as long as I'm involved in the project and can accept the technical development. The project, so far, consists of the following repositories.

The main repository justbuild.
Contains the sources of the buildtool (just), the repository fetching and setup tool (just-mr), auxiliary tools for maintaining multi-repository descriptions, and the documentation (man pages, description of concepts, tutorial).
Language-specific rules for
- C/C++
- rust
- typestting (latex, pandoc)
A demonstration on how to use justbuild on nix for local builds, making good use of the fact hat nix allows to easily set up well-defined dependencies and a well-defined environment.
A bootstrappable description of a C/C++ toolchain consisting of compilers with appropriate libraries, linting tools, as well as some auxiliary programs (busybox, python, make, cmake).
A justbuild description to build static binaries of just and just-mr, together with the hashes that these binaries will have.

The project is also packaged in various distributions.

Selected talks.

Treating build definitions independent of their origin, Lightenting talk at FOSDEM 2025. (Klaus Aehlig)
The What, How, and Where of Software build systems, November 2024 (slides only, no recordings). (Klaus Aehlig)
Staging of Artifacts in a Build System, Lightenting talk at FOSDEM 2023. (Sascha Roloff)
just, a build system, December 2022 (slides only, no recordings). (Klaus Aehlig)

PS: Naming is hard and I know that the name for this project is not well chosen. But that's what you get when a "fun" name is used internally as a code name knowing that a committee (which I was not a member of) will decide on an official name before the open sourcing—and that committee decides on the temporary name.

The justbuild project

The `justbuild` project