Discussion:
Building with many cores without OOM
Add Reply
Helmut Grohne
2024-11-28 10:00:02 UTC
Reply
Permalink
Hi Guillem and other developers,

I am one of those who builds a lot of different packages with different
requirements and found that picking a good parallel=... value in
DEB_BUILD_OPTIONS is hard. Go too low and your build takes very long. Go
too high and you swap until the OOM killer terminates your build. (Usage
of choom recommended in any case.)

Common offenders are rustc and llvm-toolchain, but there are a few more.
These packages already offer some assistance.

llvm-toolchain-19 parses /proc/meminfo and reduces parallelism:
https://sources.debian.org/src/llvm-toolchain-19/1%3A19.1.4-1/debian/rules/#L92

A number of packages use debhelper's --max-parallel:
https://codesearch.debian.net/search?q=--max-parallel%3D%5B%5E1%5D+path%3Adebian%2Frules&literal=0

polymake limits parallelity to 4GB per core:
https://sources.debian.org/src/polymake/4.12-3/debian/rules/?hl=15#L15

Another implementation is in openscad:
https://sources.debian.org/src/openscad/2021.01-8/debian/rules/?hl=33#L33

ns3 turns off parallelism if the RAM is too limited:
https://sources.debian.org/src/ns3/3.42-2/debian/rules/?hl=26#L26

I think this demonstrates that we probably have something between 10 and
50 packages in unstable that would benefit from a generic parallelism
limit based on available RAM. Do others agree that this is a problem
worth solving in a more general way?

For one thing, I propose extending debhelper to provide
--min-ram-per-parallel-core as that seems to be the most common way to
do it. I've proposed
https://salsa.debian.org/debian/debhelper/-/merge_requests/128
to this end.

Unfortunately, a the affeted packages tend to not just be big, but also
so special that they cannot use dh_auto_*. As a result, I also looked at
another layer to support this and found /usr/share/dpkg/buildopts.mk,
which sets DEB_BUILD_OPTION_PARALLEL by parsing DEB_BUILD_OPTIONS. How
about extending this file with a mechanism to reduce parallelity? I am
attaching a possible extension to it to this mail to see what you think.
Guillem, is that something you consider including in dpkg?

Are there other layers that could reasonably be used to implement a more
general form of parallelism limiting based on system RAM? Ideally, we'd
consolidate these implementations into fewer places.

As I am operating build daemons (outside Debian), I note that I have to
limit their cores below what is actually is available to avoid OOM
kills and even that is insufficient in some cases. In adopting such a
mechanism, we could generally raise the core count per buildd and
consider OOM a problem of the package to be fixed by applying a sensible
parallelism limit.

Helmut
Chris Hofstaedtler
2024-11-28 12:10:01 UTC
Reply
Permalink
Post by Helmut Grohne
I think this demonstrates that we probably have something between 10 and
50 packages in unstable that would benefit from a generic parallelism
limit based on available RAM. Do others agree that this is a problem
worth solving in a more general way?
Yes. Looking at hardware trends, machines will be more
RAM-constrained per CPU core than ever.

IMO it would be good to support dealing with this earlier than
later.

Chris
Paul Gevers
2024-11-28 13:40:01 UTC
Reply
Permalink
Hi Helmut,
Post by Chris Hofstaedtler
IMO it would be good to support dealing with this earlier than
later.
And doing it in a way that can be reused by how autopkgtests are run
would maybe be good too.

Paul
Helmut Grohne
2024-11-29 07:10:01 UTC
Reply
Permalink
Hi Paul,
Post by Chris Hofstaedtler
IMO it would be good to support dealing with this earlier than
later.
And doing it in a way that can be reused by how autopkgtests are run would
maybe be good too.
Can you clarify what you mean here? There is autopkgtest
--build-parallel and my understanding is that as packages lower the
requested parallelity by themselves, this aspect of autopkgtest would be
implicitly covered by the proposals at hand. Do you refer to test
parallelity here? Is there any setting or flag to configure that that I
may have missed?

Helmut
Paul Gevers
2024-11-29 20:30:01 UTC
Reply
Permalink
Hi Helmut,
Post by Helmut Grohne
And doing it in a way that can be reused by how autopkgtests are run would
maybe be good too.
Can you clarify what you mean here? There is autopkgtest
--build-parallel and my understanding is that as packages lower the
requested parallelity by themselves, this aspect of autopkgtest would be
implicitly covered by the proposals at hand. Do you refer to test
parallelity here? Is there any setting or flag to configure that that I
may have missed?
I'm not talking about how /usr/bin/autopkgtest is called (and thus (?)
not about a package build if that's needed), but I'm talking about how
tests themselves should be dealing with parallelism. I recall some tests
running out of memory (https://ci.debian.net/status/reject_list/
mentions at least two currently).

Paul
Holger Levsen
2024-11-28 12:20:02 UTC
Reply
Permalink
Post by Helmut Grohne
I think this demonstrates that we probably have something between 10 and
50 packages in unstable that would benefit from a generic parallelism
limit based on available RAM. Do others agree that this is a problem
worth solving in a more general way?
yes.
--
cheers,
Holger

⢀⣎⠟⠻⢶⣊⠀
⣟⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
⠈⠳⣄

Historians have a word for Germans who joined the Nazi party, not because they
hated Jews, but out of hope for restored patriotism, or a sense of economic
anxiety, or a hope to preserve their religious values, or dislike of their
opponents, or raw political opportunism, or convenience, or ignorance, or
greed.
That word is "Nazi". Nobody cares about their motives anymore.
Niels Thykier
2024-11-29 14:10:02 UTC
Reply
Permalink
Post by Helmut Grohne
Hi Guillem and other developers,
I am one of those who builds a lot of different packages with different
requirements and found that picking a good parallel=... value in
DEB_BUILD_OPTIONS is hard. Go too low and your build takes very long. Go
too high and you swap until the OOM killer terminates your build. (Usage
of choom recommended in any case.)
[...]
I think this demonstrates that we probably have something between 10 and
50 packages in unstable that would benefit from a generic parallelism
limit based on available RAM. Do others agree that this is a problem
worth solving in a more general way?
For one thing, I propose extending debhelper to provide
--min-ram-per-parallel-core as that seems to be the most common way to
do it. I've proposed
https://salsa.debian.org/debian/debhelper/-/merge_requests/128
to this end.
Unfortunately, a the affected packages tend to not just be big, but also
so special that they cannot use dh_auto_*. As a result, I also looked at
another layer to support this and found /usr/share/dpkg/buildopts.mk,
which sets DEB_BUILD_OPTION_PARALLEL by parsing DEB_BUILD_OPTIONS. How
about extending this file with a mechanism to reduce parallelity? I am
attaching a possible extension to it to this mail to see what you think.
Guillem, is that something you consider including in dpkg?
My suggestion would be to have `dpkg` have a "RAM restrained"
parallelization limit next to the default one. This is similar to how
the `debhelper` one works (the option only applies to the dh_auto_
steps). This might be what you are proposing here (was unsure).

Generally, the RAM limit only applies to the upstream build side of
things and this can be relevant for some packages. As an example,
`firefox-esr` builds 20+ packages, so there is value in having "post
processing" (dh_install ... dh_builddeb) happening with a higher degree
of parallelization than the upstream build part to keep the build time
as low as possible.

I think 2 degrees of parallelization limits would be sufficient for most
cases (at least from a cost/benefit PoV).
Post by Helmut Grohne
Are there other layers that could reasonably be used to implement a more
general form of parallelism limiting based on system RAM? Ideally, we'd
consolidate these implementations into fewer places.
[...]
Helmut
We do have some custom implementations of debhelper build systems around
in the archive[1], that in theory could do with this (though they are
probably not worth hunting out - more of "update if they become a
problem"). Then there is `debputy`, but I can have a look at that later
after I have reviewed the patch for `debhelper`.

Best regards,
Niels

[1] Any `dh-sequence-X` sequences that replace `dh_auto_*` commands
would likely fall into this category.
Guillem Jover
2024-12-04 13:10:01 UTC
Reply
Permalink
Hi!
Post by Helmut Grohne
I am one of those who builds a lot of different packages with different
requirements and found that picking a good parallel=... value in
DEB_BUILD_OPTIONS is hard. Go too low and your build takes very long. Go
too high and you swap until the OOM killer terminates your build. (Usage
of choom recommended in any case.)
I think this demonstrates that we probably have something between 10 and
50 packages in unstable that would benefit from a generic parallelism
limit based on available RAM. Do others agree that this is a problem
worth solving in a more general way?
I think the general idea make sense, yes.
Post by Helmut Grohne
For one thing, I propose extending debhelper to provide
--min-ram-per-parallel-core as that seems to be the most common way to
do it. I've proposed
https://salsa.debian.org/debian/debhelper/-/merge_requests/128
to this end.
To me this looks too high in the stack (and too Linux-specific :).
Post by Helmut Grohne
Unfortunately, a the affeted packages tend to not just be big, but also
so special that they cannot use dh_auto_*. As a result, I also looked at
another layer to support this and found /usr/share/dpkg/buildopts.mk,
which sets DEB_BUILD_OPTION_PARALLEL by parsing DEB_BUILD_OPTIONS. How
about extending this file with a mechanism to reduce parallelity? I am
attaching a possible extension to it to this mail to see what you think.
Guillem, is that something you consider including in dpkg?
I'm not a huge fan of the make fragment files, as make programming is
rather brittle, and it easily causes lots of processes to spawn if you
look at it the wrong way (ideally I'd really like to be able to get
rid of them once we can rely on something else!). I think we could
consider adding it there, but as a last resort option, if there's no
other better place.
Post by Helmut Grohne
Are there other layers that could reasonably be used to implement a more
general form of parallelism limiting based on system RAM? Ideally, we'd
consolidate these implementations into fewer places.
I think adding this in dpkg-buildpackage itself would make most sense
to me, where it is already deciding what amount of parallelism to use
when specifying «auto» for example.

Given that this would be and outside-in interface, I think this would
imply declaring these parameters say as debian/control fields for example,
or some other file to be parsed from the source tree.

My main concerns would be:

* Portability.
* Whether this is a local property of the package (so that the
maintainer has the needed information to decide on a value, or
whether this depends on the builder's setup, or perhaps both).
* We might need a way to percolate these parameters to children of
the build/test system (as Paul has mentioned), where some times
you cannot specify this directly in the parent. Setting some
standardize environment variables would seem sufficient I think,
but while all this seems kind of optional, this goes a bit into
reliance on dpkg-buildpackage being the only supported build
entry point. :)
Post by Helmut Grohne
As I am operating build daemons (outside Debian), I note that I have to
limit their cores below what is actually is available to avoid OOM
kills and even that is insufficient in some cases. In adopting such a
mechanism, we could generally raise the core count per buildd and
consider OOM a problem of the package to be fixed by applying a sensible
parallelism limit.
See above, on whether this is really package or setup dependent.

Thanks,
Guillem
Guillem Jover
2024-12-04 13:30:01 UTC
Reply
Permalink
Hi!
Post by Guillem Jover
Post by Helmut Grohne
Are there other layers that could reasonably be used to implement a more
general form of parallelism limiting based on system RAM? Ideally, we'd
consolidate these implementations into fewer places.
I think adding this in dpkg-buildpackage itself would make most sense
to me, where it is already deciding what amount of parallelism to use
when specifying «auto» for example.
Given that this would be and outside-in interface, I think this would
imply declaring these parameters say as debian/control fields for example,
or some other file to be parsed from the source tree.
* Portability.
* Whether this is a local property of the package (so that the
maintainer has the needed information to decide on a value, or
whether this depends on the builder's setup, or perhaps both).
* We might need a way to percolate these parameters to children of
the build/test system (as Paul has mentioned), where some times
you cannot specify this directly in the parent. Setting some
standardize environment variables would seem sufficient I think,
but while all this seems kind of optional, this goes a bit into
reliance on dpkg-buildpackage being the only supported build
entry point. :)
Ah, and forgot to mention, that for example dpkg-deb (via libdpkg)
already implements this kind of parallelism limiter based on system
memory when compressing to xz. But in that case we are assisted by
liblzma telling us the amount of memory expected to be used, so it
makes it easier to clamp the parallelism based on that. Unfortunately
I'm not sure, in general, we have this kind of information available,
and my assumption is that in many cases we might end up deciding on
clamping factors out of current observations based on current
implementation details, that might need manual tracking and adjustment
going on.

Thanks,
Guillem
Stefano Rivera
2024-12-04 14:40:01 UTC
Reply
Permalink
Hi Guillem (2024.12.04_13:03:29_+0000)
Post by Guillem Jover
Post by Helmut Grohne
Are there other layers that could reasonably be used to implement a more
general form of parallelism limiting based on system RAM? Ideally, we'd
consolidate these implementations into fewer places.
I think adding this in dpkg-buildpackage itself would make most sense
to me, where it is already deciding what amount of parallelism to use
when specifying «auto» for example.
Given that this would be and outside-in interface, I think this would
imply declaring these parameters say as debian/control fields for example,
or some other file to be parsed from the source tree.
I don't think this can be entirely outside-in, the package needs to say
how much ram it needs per-core, to be able to calculate the appropriate
degree of parallelism. So, we have to declare a value that then gets
calculated against the proposed parallelism.

Stefano
--
Stefano Rivera
http://tumbleweed.org.za/
+1 415
Guillem Jover
2024-12-05 02:50:01 UTC
Reply
Permalink
Hi!
Post by Stefano Rivera
Hi Guillem (2024.12.04_13:03:29_+0000)
Post by Guillem Jover
Post by Helmut Grohne
Are there other layers that could reasonably be used to implement a more
general form of parallelism limiting based on system RAM? Ideally, we'd
consolidate these implementations into fewer places.
I think adding this in dpkg-buildpackage itself would make most sense
to me, where it is already deciding what amount of parallelism to use
when specifying «auto» for example.
Given that this would be and outside-in interface, I think this would
imply declaring these parameters say as debian/control fields for example,
or some other file to be parsed from the source tree.
I don't think this can be entirely outside-in, the package needs to say
how much ram it needs per-core, to be able to calculate the appropriate
degree of parallelism. So, we have to declare a value that then gets
calculated against the proposed parallelism.
I _think_ we are saying the same, and there might just be a mismatch
in nomenclature (most probably stemming from me being non-native and
using/reusing terms incorrectly)? So let me clarify what I meant,
otherwise I might be misunderstanding your comment, and I'd appreciate
a clarification. :)

When dealing with dpkg packaging build interfaces, in my mind there are
two main models:

* outside-in: where the build driver (dpkg-buildpackage in this case)
can reach for all needed information and then do stuff based on that,
or pass that information down into debian/rules process hierarchy,
or to tools it invokes itself (say dpkg-genchanges); another such
interface could be R³ where trying to change the default from
debian/rules is already too late, as that's managed by the
build driver.

* inside-out: where debian/rules, files sourced from it, or tools
invoked from it, fully control the outcome of the operation, and
then dpkg-buildpackage might not be able to tell beforehand
exactly what will happen and will need to pick up the results after
the fact, for example that would include dpkg-deb or dpkg-distaddfile
being currently fully delegated to debian/rules, and then
dpkg-buildpackage, et al. picking that up through debian/files;
debhelper would be a similar paradigm.

(With some exceptions, I consider that the bulk of our build interfaces
are unfortunately mostly inside-out.)

For this particular case, I'd envision the options could look something
like:

* outside-in:

- We add a new field, say (with this not very good name that would
need more thought) Build-Parallel-Mem-Limit-Per-Core for the
debian/control source stanza, then dpkg-buildpackage would be able
to check the current system memory, and clamp the number of
computed parallel jobs based on the number of system cores, the
number of specified parallel jobs and the limit from the above
field. This then would be passed down as the usual parallel=
DEB_BUILD_OPTIONS.

- If we needed the package to provide a dynamic value depending on
other external factors outside its control, although there's no
current precedent that I'm aware, and it seems a bit ugly, I guess
we could envision some kind of new entry point and a way to
let the build drivers know it needs to call it, for example a
debian/rules target that gets called and generates some file or
a program to call under debian/ that prints some value, which
dpkg-buildpackage could use in a similar way as the above point.

* inside-out:

For this, there could be multiple variants, where a build driver
like dpkg-buildpackage is completely out of the picture, and were
we might end up with parallel settings that are out-of-sync
between DEB_BUILD_OPTIONS parallel= and the inner one, for example:

- One could be the initially proposed buildopts.mk extension,

- Add a new dpkg-something helper or a new command to an existing
tool, that could compute the value that debian/rules would use
or pass further down,

- debhelper/debputy/etc does it, but that leaves out non-helper
using packages, which was one of the initial concerns from
Helmut.

Hope this clarifies.

Thanks,
Guillem
Simon Richter
2024-12-05 07:20:01 UTC
Reply
Permalink
Hi,
Post by Stefano Rivera
I don't think this can be entirely outside-in, the package needs to say
how much ram it needs per-core, to be able to calculate the appropriate
degree of parallelism. So, we have to declare a value that then gets
calculated against the proposed parallelism.
This.

Also, the Ninja build system provides resource pools -- typical packages
use only the "CPU" pool, and reserve one CPU per task started, but it is
possible to also define a "memory" pool and declare tasks with memory
reservations, which reduces parallel execution only while
memory-intensive tasks are running.

For LLVM specifically, GettingStarted.md documents

- -DLLVM_PARALLEL_{COMPILE,LINK,TABLEGEN}_JOBS=N — Limit the number
of compile/link/tablegen jobs running in parallel at the same
time. This is especially important for linking since linking can
use lots of memory. If you run into memory issues building LLVM,
try setting this to limit the maximum number of compile/link/
tablegen jobs running at the same time.

How we arrive at N is left as an exercise for the reader though.

Simon
Helmut Grohne
2024-12-05 12:10:01 UTC
Reply
Permalink
Hi Guillem and others,

Thanks for your extensive reply and the followup clarifying the
inside-out and outside-in distinction.
Post by Guillem Jover
Post by Helmut Grohne
I think this demonstrates that we probably have something between 10 and
50 packages in unstable that would benefit from a generic parallelism
limit based on available RAM. Do others agree that this is a problem
worth solving in a more general way?
I think the general idea make sense, yes.
Given the other replies on this thread, I conclude that we have rough
consensus on this being a problem worth solving (expending effort and
code and later maintenance cost on).
Post by Guillem Jover
Post by Helmut Grohne
For one thing, I propose extending debhelper to provide
--min-ram-per-parallel-core as that seems to be the most common way to
do it. I've proposed
https://salsa.debian.org/debian/debhelper/-/merge_requests/128
to this end.
To me this looks too high in the stack (and too Linux-specific :).
Let me take the opportunity to characterize this proposal inside-out
given your distinction.

I don't think being Linux-specific is necessarily bad here and note that
the /proc interface is also supported by Hurd (I actually checked on a
porter box). The problem we are solving here is a practical one and the
solution we pick now probably is no longer relevant in twenty years.
That's about the time frame I am expect Linux to be the preferred kernel
used by Debian (could be longer, but unlikely shorter).
Post by Guillem Jover
I think adding this in dpkg-buildpackage itself would make most sense
to me, where it is already deciding what amount of parallelism to use
when specifying «auto» for example.
Given that this would be and outside-in interface, I think this would
imply declaring these parameters say as debian/control fields for example,
or some other file to be parsed from the source tree.
I find that outside-in vs inside-out distinction quite useful, but I
actually prefer an inside-out approach. You detail that picking a
sensible ram-per-core value is environment-specific. Others gave
examples of how build-systems address this in ways of specifying linker
groups with reduced parallelism and you go into detail of how the
compression parallelism is limited based on system ram already. Given
all of these, I no longer am convinced that reducing the package-global
parallelism is the desired solution. Rather, each individual step may
benefit from its own limiting and that's what is already happening in
the archive. It is that inside-out approach that we see in debian/rules
in some packages. What I now find missing is better tooling to support
this inside-out approach.
Post by Guillem Jover
* Portability.
I am not concerned. The parallelism limit is a mechanism to increase
efficiency of builder deployments and not much more. The portable
solution is to stuff in more RAM or supply a lower parallel value
outside-in. A 90% solutions is more than good enough here.
Post by Guillem Jover
* Whether this is a local property of the package (so that the
maintainer has the needed information to decide on a value, or
whether this depends on the builder's setup, or perhaps both).
All of what I wrote in this thread thus far assumed that this was a
local property. That definitely is an oversimplification of the matter
as an upgraded clang, gcc, ghc or rustc has historically yielded
increased RAM consumption. The packages affected tend to be sensitive to
changes in these packages in other ways, so they generally know quite
closely what version of dependencies will be in use and can tailor their
guesses. So while this is a non-local property in principle, my
expectation is that treating it as if it was local is good enough for a
90% solution.
Post by Guillem Jover
* We might need a way to percolate these parameters to children of
the build/test system (as Paul has mentioned), where some times
you cannot specify this directly in the parent. Setting some
standardize environment variables would seem sufficient I think,
but while all this seems kind of optional, this goes a bit into
reliance on dpkg-buildpackage being the only supported build
entry point. :)
To me, this reads as an argument for using an inside-out approach.

Given all of the other replies (on-list and off-list), my vision of how
I'd like to see this approached has changed. I see more and more value
in leaving this in close control of the package maintainer (i.e.
inside-out) to the point where different parts of the build may use
different limits.

How about instead we try to extend coreutils' nproc? How about adding
more options to it?

--assume-units=N
--max-units=N
--min-ram-per-unit=Z

Then, we could continue to use buildopts.mk and other mechanism to
extract the passed parallel value from DEB_BUILD_OPTIONS as before and
run it through an nproc invocation for passing it down to a build system
in the specific ways that the build system requires. More options could
be added to nproc as-needed.

Helmut
Guillem Jover
2024-12-09 13:50:01 UTC
Reply
Permalink
Hi!
Post by Helmut Grohne
Post by Guillem Jover
Post by Helmut Grohne
For one thing, I propose extending debhelper to provide
--min-ram-per-parallel-core as that seems to be the most common way to
do it. I've proposed
https://salsa.debian.org/debian/debhelper/-/merge_requests/128
to this end.
To me this looks too high in the stack (and too Linux-specific :).
I don't think being Linux-specific is necessarily bad here and note that
the /proc interface is also supported by Hurd (I actually checked on a
porter box). The problem we are solving here is a practical one and the
solution we pick now probably is no longer relevant in twenty years.
That's about the time frame I am expect Linux to be the preferred kernel
used by Debian (could be longer, but unlikely shorter).
See below for the portability part.
Post by Helmut Grohne
Post by Guillem Jover
I think adding this in dpkg-buildpackage itself would make most sense
to me, where it is already deciding what amount of parallelism to use
when specifying «auto» for example.
Given that this would be and outside-in interface, I think this would
imply declaring these parameters say as debian/control fields for example,
or some other file to be parsed from the source tree.
I find that outside-in vs inside-out distinction quite useful, but I
actually prefer an inside-out approach. You detail that picking a
sensible ram-per-core value is environment-specific. Others gave
examples of how build-systems address this in ways of specifying linker
groups with reduced parallelism and you go into detail of how the
compression parallelism is limited based on system ram already. Given
all of these, I no longer am convinced that reducing the package-global
parallelism is the desired solution. Rather, each individual step may
benefit from its own limiting and that's what is already happening in
the archive. It is that inside-out approach that we see in debian/rules
in some packages. What I now find missing is better tooling to support
this inside-out approach.
Not all outside-in interfaces are made equal, as I hinted on that
other mail, some are (let's call them) permeable, where the build
driver performs some defaults setup or data gathering that it does
not necessarily uses itself, and which can be easily overridden by
the inner packaging files.

I don't have a strong opinion on this case though, my initial reaction
was that because dpkg-buildpackage is already trying to provide a good
default for the number of parallel jobs to use, it seemed like a good
global place to potentially improve that number to influence all users,
if say the only thing needed is a declarative hint from the packing
itself. This being a permeable interface also means the inner processes
could still ignore or tune that value further or whatever (except for
the --jobs-force option which is harder to revert from inside).

But I guess it depends on whether we can have a better general heuristic
in the outer parallel job number computation. Or whether for the cases
that we need to tune, any such general heuristic would serve no actual
purpose and all/most of them would need to be overridden anyway.
Post by Helmut Grohne
Post by Guillem Jover
* Portability.
I am not concerned. The parallelism limit is a mechanism to increase
efficiency of builder deployments and not much more. The portable
solution is to stuff in more RAM or supply a lower parallel value
outside-in. A 90% solutions is more than good enough here.
Right, I agree with the above, because this should be considered an
opportunistic quality of life improvement, where the user can always
manually override it, if the tool does not get it right. My concern
was about the above debhelper MR failing hard on several conditions
where it should just simply disable the improved clamping. See for
example the parallel auto handling in dpkg-buildpackage (after the
«run_hook('preinit')»), or lib/dpkg/compress.c:filter_xz_get_memlimit()
and lib/dpkg/meminfo.c:meminfo_get_available_from_file()) for the
dpkg-deb one, where these should gracefully fallback to less accurate
methods if they cannot gather the needed information.

(Now that you mention it, I should probably enable Hurd for the
/proc/meminfo codepath. :)
Post by Helmut Grohne
Post by Guillem Jover
* Whether this is a local property of the package (so that the
maintainer has the needed information to decide on a value, or
whether this depends on the builder's setup, or perhaps both).
All of what I wrote in this thread thus far assumed that this was a
local property. That definitely is an oversimplification of the matter
as an upgraded clang, gcc, ghc or rustc has historically yielded
increased RAM consumption. The packages affected tend to be sensitive to
changes in these packages in other ways, so they generally know quite
closely what version of dependencies will be in use and can tailor their
guesses. So while this is a non-local property in principle, my
expectation is that treating it as if it was local is good enough for a
90% solution.
My thinking here was also about the general case too, say a system
that has many cores relative to its available memory, where each core
would get what we'd consider not enough memory per core (assuming for
example a baseline for what dpkg-deb might require, plus build helpers
and their interpreters, and what a compiler with say an empty C, C++
or similar file might need, etc).
Post by Helmut Grohne
Post by Guillem Jover
* We might need a way to percolate these parameters to children of
the build/test system (as Paul has mentioned), where some times
you cannot specify this directly in the parent. Setting some
standardize environment variables would seem sufficient I think,
but while all this seems kind of optional, this goes a bit into
reliance on dpkg-buildpackage being the only supported build
entry point. :)
To me, this reads as an argument for using an inside-out approach.
Given all of the other replies (on-list and off-list), my vision of how
I'd like to see this approached has changed. I see more and more value
in leaving this in close control of the package maintainer (i.e.
inside-out) to the point where different parts of the build may use
different limits.
I think this would be fine too. The above point was more an option to
implement an hybrid permeable outside-in, where dpkg-buildpackage
could then try to gather say the amount of available memory, number of
cores, and any other potentially relevant additional data, then perhaps
try to use a declarative hint from the packaging to tune either the
default parallel jobs value or provide a new variable, and then let
the inner processes decide based on those variables, which would avoid
having to duplicate much of the data gathering and potential
portability issues.

This could also imply alternatively or in addition, providing a tool
or adding some querying logic in an existing tools (in the dpkg toolset)
to gather that information which the packaging could use, or…
Post by Helmut Grohne
How about instead we try to extend coreutils' nproc? How about adding
more options to it?
--assume-units=N
--max-units=N
--min-ram-per-unit=Z
…as you mention, another build-essential package's tools, so that again
we do not need to duplicate much of the logic. :)
Post by Helmut Grohne
Then, we could continue to use buildopts.mk and other mechanism to
extract the passed parallel value from DEB_BUILD_OPTIONS as before and
run it through an nproc invocation for passing it down to a build system
in the specific ways that the build system requires. More options could
be added to nproc as-needed.
So, yeah that would also work.

Thanks,
Guillem
s***@free.fr
2024-12-09 20:50:01 UTC
Reply
Permalink
Hi,

Let me plop in right into this discussion with no general solution and
more things to think about. For the context I'm packaging Java things,
and Java has historically been notoriously bad at guessing how much
memory it could actually use on a given system. I'm not sure things are
much better these days. This is just to remind that the issue is nowhere
near as easy as it looks and that many attempts to generalize an
approach that works well in some cases have failed.
Post by Guillem Jover
My thinking here was also about the general case too, say a system
that has many cores relative to its available memory, where each core
would get what we'd consider not enough memory per core
This is actually a common situation on most systems but a few privileged
developers configurations. This is especially true in cloud-like,
VM/containerized environments where it is much easier (i.e. with less
immediate consequences) to overcommit CPU cores than RAM. Just look at
the price list of any cloud computing provider to get an idea of the
ratios you could start with. And then the provider may well lie about
the actual availability of the cores they will readily bill you for, and
you will only notice that when your application will grind to a halt at
the worst possible time (e.g. on a Black Friday if your business is to
sell stuff), but at least it won't get OOM-killed.

There are a few packages that are worrying me about how I'm going to
make them build and run their test suites on Salsa without either timing
out on one side, and getting immediately OOM killed at the other end of
the slider. One of them wants to allocate 17GiB of RAM per test worker,
and wants at least 3 of them. Another (Gradle) needs approximately 4 GiB
of RAM (JVM processes alone, adding OS cache + overhead to that probably
makes the total around 6-7 GiB) per additional worker for its build, and
I don't know yet how much is needed for its tests suites as my current
setup lacks the storage space necessary to run them. On my current
low-end laptop (4 threads, 16 GiB RAM) dpkg guesses [1] are wrong , I
can only run a single worker if I want to keep an IDE and a web browser
running on the side. Two if I close the IDE and kill all browser tabs
and other memory hogs. I would expect FTBFS bug reports if a
run-of-the-mill dpkg-buildpackage command failed to build the package on
such a system.
Post by Guillem Jover
(assuming for
example a baseline for what dpkg-deb might require, plus build helpers
and their interpreters, and what a compiler with say an empty C, C++
or similar file might need, etc).
+1 for taking a baseline into consideration, as the first worker is
usually significantly more expensive than additional workers. In my
experience with Java build processes the first worker penalty is in the
vicinity of +35% and can be much higher for lighter build processes (but
then they are lighter and less likely to hit a limit excepted on very
constrained environments).

Another thing I would like to add is that the requirements may change
depending on the phase of the build, especially between building and
testing. For larger projects, building requires usually more memory but
less parallelism than testing. You could always throw more workers at
building, but at some point additional workers will just sit mostly idle
consuming RAM and resources as there is a limited number of tasks that
the critical path will allow at any given point. Testing, especially
with larger test suites, usually allows for (and sometimes needs) much
more parallelism.

Also worth noting, on some projects the time spent testing can be orders
of magnitude greater than the time spent building.
Post by Guillem Jover
This could also imply alternatively or in addition, providing a tool
or adding some querying logic in an existing tools (in the dpkg toolset)
to gather that information which the packaging could use, or…
Additional tooling may help a bit, but I think what would really help at
that point would be to write and publish guidelines relevant to the
technology being packaged, based on empirical evidence collected while
fine tuning the build or packaging, and kept reasonably up-to-date (i.e.
never more than 2-3 years old) with the current state of technologies
and projects. Salsa (or other CI) pipelines could be instrumented to
provide some data and once the guidelines cover a majority of packages
you will have a better insight of what, if anything, needs to be done
with the tooling.


[1]:
https://salsa.debian.org/jpd/gradle/-/blob/upgrade-to-8.11.1-wip/debian/rules#L49
--
Julien Plissonneau Duquène
Helmut Grohne
2024-12-25 11:40:01 UTC
Reply
Permalink
Package: coreutils
Version: 9.5-1
Severity: wishlist
Tags: patch upstream
X-Debbugs-Cc: debian-***@lists.debian.org, debian-***@lists.debian.org

Hi Michael,

we recently considered ways of exercising more CPU cores during package
builds on d-***@l.d.o. The discussion starts at
https://lists.debian.org/debian-devel/2024/11/msg00498.html. There, we
considered extending debhelper and dpkg. Neither of those options looked
really attractive. because they were limiting the parallelity of the
complete build. However, different phases of a builds tend to require
different amounts of memory. Typically linking requires more memory than
compiling and test suites may have different requirements. The ninja
build tool partially accommodates this by providing different "pools"
for different processes limiting linker concurrency. Generally,
individual packages have the best knowlegde of their individual memory
requirements, but turning that into parallelism is presently
inconvenient. This is where I see nproc helping.
Post by Helmut Grohne
How about instead we try to extend coreutils' nproc? How about adding
more options to it?
I propose adding new options to the nproc utility to support these use
cases. For one thing, I suggest adding --assume to override initial
detection. This allows passing the parallel=N value from
DEB_BUILD_OPTIONS as initial value to nproc. The added value arises from
a second option --require-mem that reduces the amount of parallelism
based on available system ram and user-provided requirements.

Let me sketch some expected uses:

* Typically build daemons now limit the number of system processors
by downsizing VMs to avoid builds failing with OOM. Instead, they
could supply an adjusted DEB_BUILD_OPTIONS=parallel=$(nproc
--require-mem=2G).

* Individual packages already reduce parallelism based on available
memory. A number of them attempt to parse /proc/meminfo. Instead they
could include /usr/share/dpkg/buildopts.mk and compute parallelism as
NPROC=$(shell nproc --assume=$(or $(DEB_BUILD_OPTION_PARALLEL),1)
--require-mem=3G).

* When using the meson with ninja, the linker parallelism can be
selected separately using backend_max_links and a different value
using a different --require-mem argument can be passed.

I expect these options to reduce complexity in debian/rules files and
hope that in providing better tooling we can require packages to be
buildable with higher default parallelism. Finding a good place to share
this tooling is difficult, but nproc seems like a sensible spot. The
nproc binary grows by 4kb as a result and this impacts the essential
base system.

I'm attaching a patch and note that a significant part of the diff is a
gnulib update of the physmem module. Option naming improvable.

What do you think about this approach?

Helmut
Helmut Grohne
2024-12-26 08:10:01 UTC
Reply
Permalink
Hi Michael and Pádraig,
There's zero chance I'll carry this as a debian-specific fork of nproc.
(Because I don't want to carry any new forks of the core utilities as
doing so inevitably causes long-term pain.) If you can convince
upstream, that's fine. My personal feeling is that nproc isn't the right
place for it but I'll defer to upstream[1]. (Among other things I don't
understand why this is better than extending the debian tooling in this
area, and there's nothing but an assertion that doing so is bad; I'd
suggest expanding on that.)
[1] My gut says that what you really need is a different tool to show
the memory available to the current process (which may not be the same
as the amount of free memory on the system, e.g., in the presence of
control groups). You could then divide that number by the expected
memory per process and set your parallelization factor to the lesser of
that value or nprocs. Conflating "nproc" and "nmem" seems wrong.
I note that I intentionally left cgroups out of the picture for now.
Supporting cgroups does not further affect the command line flags (as it
merely adds precision) and my earlier proposal for debhelper did
consider cgroups. It is something I see me add in later iterations iff
we agree on moving forward with this.
With my upstream hat on, coupling memory details in nproc
seems like not a good fit and better done outside that tool.
Thank you both for your quick feedback on the proposed solution. I did
not expect my patch to be taken as is, but the presence of a patch
helped clarifying what I intended here. If you are reasonably sure that
my patch will be rejected upstream, please tag this bug wontfix and
close it. Otherwise, I may try going there.

Before we move on, please allow me to ask what problem the nproc tool is
supposed to solve. Of course it tells you the number of processors, but
that is not a use case on its own. If you search for uses, "make
-j$(nproc)" (with varying build systems) is the immediate hit. This use
is now broken by hardware trends (see below). It can also be used to
partition a system and computing partition sizes. This use case
continues to work. Are there other use cases? Unless I am missing
something, it seems like nproc no longer solves its most common use case
(and this bug is asking to fix it).

Thus far, I am effectively turned down by:
* coreutils
* debhelper
* dpkg

What other place would be suitable for including this functionality? We
have a pattern of packages coming up with code chewing /proc/meminfo
using various means (refer to my initial mail referenced from the bug
submission) and reducing parallelism based on it. You may ask why there
is a need for change here. I see a hardware trend wherein the supply in
CPU cores grows faster than the supply in RAM at a time where the demand
for RAM (e.g. in linker processes) grows. As a result, I expect more and
more packages to face this need and want to supply convenient tooling
such that we can pass failure to build a package in an environment with
limited RAM onto the package rather than the build environment. This
will rightfully evoke resistance from maintainers unless there is
sensible tooling, which is what I am trying to solve in this discussion
and bug report.

Do you see the computation of allocatable RAM as something we can
accommodate in coreutils? Michael suggested adding "nmem" between the
lines. Did you mean that in an ironic way or are you open to adding such
a tool? It would solve a quite platform-dependent part of the problem
and significantly reduce the boiler plate in real use cases.

So my search for a suitable spot to solve this continues. Help
appreciated.

Helmut
Michael Stone
2024-12-26 20:10:01 UTC
Reply
Permalink
Post by Helmut Grohne
What other place would be suitable for including this functionality?
As I suggested: you need two tools or one new tool because what you're
looking for is the min of ncpus and (available_mem / process_size). The
result of that calculation is not the "number of cpus", it is the number
of processes you want to run.
Post by Helmut Grohne
have a pattern of packages coming up with code chewing /proc/meminfo
using various means (refer to my initial mail referenced from the bug
submission) and reducing parallelism based on it
Yes, I think that's basically what you need to do.
Post by Helmut Grohne
Do you see the computation of allocatable RAM as something we can
accommodate in coreutils? Michael suggested adding "nmem" between the
lines. Did you mean that in an ironic way or are you open to adding such
a tool? It would solve a quite platform-dependent part of the problem
and significantly reduce the boiler plate in real use cases.
Here's the problem: the definition of "available memory" is very vague.
`free -hwv` output from a random machine:

total used free shared buffers cache available
Mem: 30Gi 6.7Gi 2.4Gi 560Mi 594Mi 21Gi 23Gi
Swap: 11Gi 2.5Mi 11Gi
Comm: 27Gi 22Gi 4.3Gi

Is the amount of available memory 2.4Gi, 23Gi, maybe 23+11Gi? Or 4.3Gi?
IMO, there is no good answer to that question. It's going to vary based
on how/whether virtual memory is implemented, the purpose of the system
(e.g., is it dedicated to building this one thing or does it have other
roles that shouldn't be impacted), the particulars of the build process
(is reducing disk cache better or worse than reducing ||ism?), etc.--and
we havent even gotten to cgroups or other esoteric factors yet. Long
before asking where nmem should go, you'd need to figure out how nmem
would work. You're implicitly looking for this tool to be portable (or
else, what's wrong with using /proc/meminfo directly?) but I don't have
any idea how that would work. You'd need to somehow get people to define
policies, what would that look like? I'd suggest starting by writing a
proof of concept and shopping it around to get buy-in and/or see if it's
useful. The answers you get from someone doing HPC on linux may be
different from the administrator of an openbsd server or a developer on
an OS/X laptop or windows desktop. I'm personally skeptical that this is
a problem that can be solved, but maybe you'll be able to demonstrate
otherwise. At any rate, looking for a project to host & distribute the
tool would seem to be just about the last step. Actually naming the
thing won't be easy either, but showing how it works is probably a
better place to start.
Julien Plissonneau Duquène
2024-12-27 09:20:01 UTC
Reply
Permalink
Hi,
Post by Michael Stone
As I suggested: you need two tools or one new tool because what you're
looking for is the min of ncpus and (available_mem / process_size). The
result of that calculation is not the "number of cpus", it is the
number of processes you want to run.
This is definitely true. "nproc" could potentially be repurposed to mean
"number of processes" though.
Post by Michael Stone
Here's the problem: the definition of "available memory" is very vague.
total used free shared buffers
cache available
Mem: 30Gi 6.7Gi 2.4Gi 560Mi 594Mi
21Gi 23Gi
Swap: 11Gi 2.5Mi 11Gi
Comm: 27Gi 22Gi 4.3Gi
Is the amount of available memory 2.4Gi, 23Gi, maybe 23+11Gi? Or 4.3Gi?
IMO, there is no good answer to that question.
I would rather argue that there is no perfect answer to that question,
but that the 23GiB in the "Available" column are good enough for most
use cases including building stuff, IF (and only if) you take into
account that you can't have all of it committed by processes as you
still need a decent amount of cache and buffers (how much? very good
question thank you) for that build to run smoothly and efficiently. Swap
should be ignored for all practical purposes here.
Post by Michael Stone
(or else, what's wrong with using /proc/meminfo directly?)
I haven't looked at how packages currently try to compute potential
parallelism using data from /proc/meminfo, but my own experience with
Java stuff and otherwise perfectly competent, highly qualified engineers
getting available RAM computation wrong makes me not too optimistic
about the overall accuracy of these guesses.

E.g. a few hours ago
Post by Michael Stone
I fear your rebuild is ooming workers (...) it seems that some package
is reducing is parallelism to two c++ compilers and that still exceeds
20G
Providing a simple tool that standardizes the calculation and
documenting examples and guidelines is certainly going to help here. It
will also move the logic to collect, parse and compute the result to a
single place, reducing logic duplication and maintainance burden across
packages.
Post by Michael Stone
You'd need to somehow get people to define policies, what would that
look like?
I would suggest making it possible to input the overall marginal RAM
requirements per parallelized process. That is, the amount of additional
"available RAM" needed for every additional process. As that value is
very probably going to be larger for the first processes, and as this
fact matters more on constrained environments (e.g. containers, busy CI
runners etc), making it possible to sort of define a curve (e.g. 8 GiB -
5 GiB - 2 GiB - 2 GiB ... => 7 workers with 23 GiB available RAM) will
allow a closer match to the constraints of these environments.

In addition, providing an option to limit the computed result to the
number of available actual cpu cores (not vcpus/threads) and another one
to place an arbitrary upper limit of process beyond which no gains are
expected would be nice.

Cheers,
--
Julien Plissonneau Duquène
Helmut Grohne
2024-12-27 12:00:01 UTC
Reply
Permalink
Control: tags -1 + wontfix
Control: close -1

Hi Michael,
Post by Michael Stone
Post by Helmut Grohne
What other place would be suitable for including this functionality?
As I suggested: you need two tools or one new tool because what you're
looking for is the min of ncpus and (available_mem / process_size). The
result of that calculation is not the "number of cpus", it is the number of
processes you want to run.
This reinforces the question asked in my previous mail what use case
nproc solves. There I have been arguing that changing circumstances
render a significant fraction of what I see as its use cases becoming
broken.
Post by Michael Stone
Here's the problem: the definition of "available memory" is very vague.
There is no question about that. You are looking at it from a different
angle than I am though. Perfection is not the goal here. The goal is
guessing better than we currently do. There are two kinds of errors we
may do here.

We may guess a higher concurrency than actually works. This is the
status quo and it causes failing builds. As a result we have been
limiting the number of processors available to build machines and thus
reduce efficiency. So whatever we do here can hardly be worse than the
status quo.

We may guess a lower concurrency than actually works. In this case, we
slow down builds. To a certain extent, this will happen. In return, we
get less failing builds and we get a higher available concurrency to the
majority of builds that do not require huge amounts of RAM. We are not
optimizing build latency here, but build throughput as well as reducing
spurious build failures. Accepting this error is part of the proposed
strategy.
Post by Michael Stone
IMO, there is no good answer to that question. It's going to vary based on
how/whether virtual memory is implemented, the purpose of the system (e.g.,
is it dedicated to building this one thing or does it have other roles that
shouldn't be impacted), the particulars of the build process (is reducing
disk cache better or worse than reducing ||ism?), etc.--and we havent even
gotten to cgroups or other esoteric factors yet. Long before asking where
nmem should go, you'd need to figure out how nmem would work. You're
This is exactly why I supplied a patch, right? I am beyond the figuring
out how it should work as I have now translated the proposed
implementation into the third programming language. As far as I can see,
it works for the typical build machine that does little beyond compiling
software.
Post by Michael Stone
implicitly looking for this tool to be portable (or else, what's wrong with
using /proc/meminfo directly?) but I don't have any idea how that would
work. You'd need to somehow get people to define policies, what would that
look like? I'd suggest starting by writing a proof of concept and shopping
it around to get buy-in and/or see if it's useful. The answers you get from
someone doing HPC on linux may be different from the administrator of an
openbsd server or a developer on an OS/X laptop or windows desktop. I'm
personally skeptical that this is a problem that can be solved, but maybe
you'll be able to demonstrate otherwise. At any rate, looking for a project
to host & distribute the tool would seem to be just about the last step.
Actually naming the thing won't be easy either, but showing how it works is
probably a better place to start.
Your resistance is constructive. Both of us agree that the proposed
heuristic falls short in a number of situations and will need
improvements to cover more situations. Iterating this via repeated
coreutils updates likely is a disservice to users as it causes long
iteration times and renders coreutils (or part of it)
unreliable/unstable. As a result, you suggest self-hosting it at least
for a while. I was initially disregarding this option as it looked like
such a simple feature, but your reasoning more and more convinces me
that it is not as simple as originally anticipated. Doing it as a new
upstream project actually has some merit as the number of expected users
is fairly low.

Thanks for engaging in this discussion and clarifying your views as that
moved the discussion forward. You made me agree that coreutils is not a
good place (at least not now). Especially your and Guillem's earlier
feedback significantly changed the way I look at this.

Helmut
Otto Kekäläinen
2024-12-27 17:00:01 UTC
Reply
Permalink
Hi,
Post by Helmut Grohne
Before we move on, please allow me to ask what problem the nproc tool is
supposed to solve. Of course it tells you the number of processors, but
that is not a use case on its own. If you search for uses, "make
-j$(nproc)" (with varying build systems) is the immediate hit. This use
is now broken by hardware trends (see below). It can also be used to
partition a system and computing partition sizes. This use case
continues to work. Are there other use cases? Unless I am missing
something, it seems like nproc no longer solves its most common use case
(and this bug is asking to fix it).
* coreutils
* debhelper
* dpkg
What other place would be suitable for including this functionality? We
Thanks Helmut for trying to solve this. I rely on nproc in all my
packages to instruct Make on how many parallel build processes to run,
and I have repeatedly run into the issue that on large hardware with
16+ cores available, builds tend to crash on memory issues unless
there is also at least 2GB of physical RAM on the system per core. The
man page of nproc says its purpose is to "print the number of
processing units available" so I too would have assumed that adding a
memory cap would fit nproc best. Man page of nproc also already has
the concept of "offline cpu" for other reasons. Creating a new tool
just for this seems like a lot of overhead.
Julien Plissonneau Duquène
2025-01-13 18:10:01 UTC
Reply
Permalink
Hi,
showing how it works is probably a better place to start.
Let's start with this then. I implemented a PoC prototype [1] as a shell
script that is currently fairly linux-specific and doesn't account for
cgroup limits (yet?). Feedback is welcome (everything is open for
discussion there, including the name) and if there is enough interest I
may end up packaging it or submitting it to an existing collection (I am
thinking about devscripts).

I'm planning to give it a try on Gradle and Kotlin builds.

Cheers,


[1]: https://salsa.debian.org/jpd/parallimit
--
Julien Plissonneau Duquène
Helmut Grohne
2025-01-16 10:00:01 UTC
Reply
Permalink
Hi Julien,
Post by Julien Plissonneau Duquène
Let's start with this then. I implemented a PoC prototype [1] as a shell
script that is currently fairly linux-specific and doesn't account for
cgroup limits (yet?). Feedback is welcome (everything is open for discussion
there, including the name) and if there is enough interest I may end up
packaging it or submitting it to an existing collection (I am thinking about
devscripts).
I'm sorry for not having come back earlier and thus caused duplicaton of
work. I had started a Python-based implementation last year and then
dropped the ball over other stuff. It also implements the --require-mem
flag in the way you suggested. It parses DEB_BUILD_OPTIONS,
RPM_BUILD_NCPUS and CMAKE_BUILD_PARALLEL_LEVEL and also considers cgroup
memory limits. I hope this captures all of the feedback I got during
discussions and research.

I'm attaching my proof of concept. Would you join forces and turn either
of these PoCs into a proper Debian package that could be used during
package builds? Once accepted, we may send patches to individual Debian
packages making use of it and call OOM FTBFS a packaging bug eventually.

Helmut
Julien Plissonneau Duquène
2025-01-16 17:30:02 UTC
Reply
Permalink
(trimming the Cc: list a bit now that the announcements are done, last
Cc: to #1091394, followup on debian-devel)

Hi Helmut,
Post by Helmut Grohne
I'm attaching my proof of concept. Would you join forces and turn either
of these PoCs into a proper Debian package that could be used during
package builds? Once accepted, we may send patches to individual Debian
packages making use of it and call OOM FTBFS a packaging bug
eventually.
I'm fine with that idea, and both PoCs use mostly the same logic (yours
is a bit more advanced). Besides my personal distaste for Python ;) I
implemented mine as a shell script so it can just be copied into
packages for testing without introducing additional dependencies.

Before introducing a whole new package for either of the candidates I
would try to submit it to devscripts though, rereading the thread it
seems that you didn't try that yet, and a Python implementation is
probably better suited for there (larger pool of potential contributors,
easier to implement testing etc). Maybe you could open a draft merge
request there and see how it goes?

Some comments about the PoC itself:
- for at least the Java builds, I would like to have a way to ignore x86
"threads" in the number of processing units as once all actual cores are
saturated with worker processes no further gains in throughput are
expected with additional processes, with a risk of negative gain and a
high memory cost; these Java processes are often heavily multithreaded
anyway and already benefit from SMT where it's available
- exposing nproc options and processing further options from environment
variables may make it easier to replace nproc invocations with the new
script in existing Makefiles and build scripts.

Cheers,
--
Julien Plissonneau Duquène
Julien Plissonneau Duquène
2025-01-16 18:10:01 UTC
Reply
Permalink
Hi,
Putting the scripts into `devscripts` package would imply that
`devscripts` becomes part of the `bootstrap essential` set of packages.
I didn't think about that and it effectively rules out devscripts for
that purpose. Is there any existing "bootstrap essential" package that
wasn't yet approached by Helmut and that could be interested in hosting
that new tool?

Cheers,
--
Julien Plissonneau Duquène
Chris Hofstaedtler
2025-01-17 14:00:02 UTC
Reply
Permalink
Putting the scripts into `devscripts` package would imply that
`devscripts` becomes part of the `bootstrap essential` set of packages.
I didn't think about that and it effectively rules out devscripts for that
purpose. Is there any existing "bootstrap essential" package that wasn't yet
approached by Helmut and that could be interested in hosting that new tool?
Please just put it into a new package.

Chris
Helmut Grohne
2025-01-29 11:50:01 UTC
Reply
Permalink
Hi Niels,
Putting the scripts into `devscripts` package would imply that `devscripts`
becomes part of the `bootstrap essential` set of packages. I am not sure the
`devscripts` maintainers are interested in that, because it implies you
cannot arbitrarily add new Dependencies. As an example, if `devscripts`
depends on even a single `arch:any` perl package, then the next `perl`
transition could have `debhelper` become uninstallable, which is not going
to be fun for anyone around at that time.
I think we can skip this discussion.

You cannot make reasonable use of guess_concurrency.py without adding
some code (most commonly that would be adding an option indicating how
much ram you need per core). As a result, you very much know when you
are using it and you may explicitly depend on whatever package contains
it.

Keep in mind that the number of source package that practically benefit
from this is around 100 (maybe half, maybe double, but far from 50%).

If we end up integrating with debhelper, I see this as an optional
integration point. Earlier, I proposed a MR to debhelper embedding the
functionality. If redoing it, we'd still add an option, but in the
absence of that option, no concurrency guessing happens. If the option
is passed, we may call out to an external binary and error out when
missing. As such, I do not see debhelper gaining a new dependency in any
way.

What I dislike about devscripts as a build dependency is that it is
quite big and comes with a number of dependencies not relevant to us.
However, think about the use case. It's only relevant to huge packages
in the first place. Those extra dependencies will not pose a noticeable
extra cost. We already have around 2000 source packages requiring
devscripts (mostly as it is a dependency of gem2deb). So while I was
originally favouring a new binary package, I lack arguments against
devscripts now.

I note that devscripts conveniently depends on python3:any already and
nproc from coreutils happens to be essential.

The question of which package we stuff it in feels more of a bike
colouring one than one relevant to debhelper beyond naming the right
package in the error message.

Helmut
Antonio Terceiro
2025-01-29 15:30:02 UTC
Reply
Permalink
We already have around 2000 source packages requiring devscripts
(mostly as it is a dependency of gem2deb). So while I was originally
favouring a new binary package, I lack arguments against devscripts
now.
It turns out gem2deb doesn't really need to depend on devscripts, and
recommending is enough for what is expected. devscripts is needed for
creating source packages with gem2deb, but not to actually build those
packages. I am demoting it to Recommends: in the next upload of gem2deb,
and this should shave a few seconds off from package builds that depend
on gem2deb.

I have no opinion on where to put the nproc on steroids script, though.
Ángel
2025-01-17 00:20:02 UTC
Reply
Permalink
Post by Helmut Grohne
Hi Julien,
Post by Julien Plissonneau Duquène
Let's start with this then. I implemented a PoC prototype [1] as a shell
script that is currently fairly linux-specific and doesn't account for
cgroup limits (yet?). Feedback is welcome (everything is open for discussion
there, including the name) and if there is enough interest I may end up
packaging it or submitting it to an existing collection (I am thinking about
devscripts).
I'm sorry for not having come back earlier and thus caused duplicaton of
work. I had started a Python-based implementation last year and then
dropped the ball over other stuff. It also implements the --require-
mem
flag in the way you suggested. It parses DEB_BUILD_OPTIONS,
RPM_BUILD_NCPUS and CMAKE_BUILD_PARALLEL_LEVEL and also considers cgroup
memory limits. I hope this captures all of the feedback I got during
discussions and research.
I'm attaching my proof of concept. Would you join forces and turn either
of these PoCs into a proper Debian package that could be used during
package builds? Once accepted, we may send patches to individual Debian
packages making use of it and call OOM FTBFS a packaging bug
eventually.
Helmut
The script looks good, and easy to read. It wouldn't be hard to
translate it to another language if needed to drop the python
dependency (but that would increase the nloc)

I find this behavior a bit surprising:

$ python3 guess_concurrency.py --min 10 --max 2
10

If there is a minimum limit, it is returned, even if that violates the
max. It makes some sense to pick something but I as actually expecting
an error to the above.

The order of processing the cpus is a bit awkward as well.

The order it uses is CMAKE_BUILD_PARALLEL_LEVEL, DEB_BUILD_OPTIONS,
RPM_BUILD_NCPUS, --detect <n>, nproc/os.cpu_count()

But the order in the code is 4, 5, 3, 2, 1
Not straightforward.
Also, it is doing actions such as running external program nproc even
it if's going to be discarded later. (nproc is in an essential package,
I know, but still)

Also, why would the user want to manually specify between nptoc and
os.cpu_count()?

I would unconditionally call nproc, with a fallback to os.cpu_count()
if that fails (I'm assuming nproc may be smarter than os.cpu_count(),
otherwise one could use cpu_count() always)

I suggest doing:

def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument(
"--count",
action="store",
default=None,
metavar="NUMBER",
help="supply a processor count",
)

(...)

args = parser.parse_args()
guess = None
try:
if args.count:
guess = positive_integer(args.count)
except ValueError:
parser.error("invalid argument to --count")
guess = guess or guess_from_environment("CMAKE_BUILD_PARALLEL_LEVEL")
guess = guess or guess_deb_build_parallel()
guess = guess or guess_from_environment("RPM_BUILD_NCPUS")
if not guess:
try:
guess = guess_nproc()
finally:
guess = guess or guess_python()



Additionally, the --ignore argument of nproc(1) might be of use for
this script as well.


Best regards

Ángel
Helmut Grohne
2025-01-29 11:50:02 UTC
Reply
Permalink
Hi Ángel,
Post by Ángel
The script looks good, and easy to read. It wouldn't be hard to
translate it to another language if needed to drop the python
dependency (but that would increase the nloc)
thank you for the detailed review. I also picked up Julien's request for
detecting cores rater than threads and attach an updated version. I
suppose moving to some git would be good.

https://salsa.debian.org/helmutg/guess_concurrency
To get things going, I created it in my personal namespace, but I'm
happy to move it to debian or elsewhere.
Post by Ángel
$ python3 guess_concurrency.py --min 10 --max 2
10
If there is a minimum limit, it is returned, even if that violates the
max. It makes some sense to pick something but I as actually expecting
an error to the above.
How could one disagree with that? Fixed.
Post by Ángel
The order of processing the cpus is a bit awkward as well.
The order it uses is CMAKE_BUILD_PARALLEL_LEVEL, DEB_BUILD_OPTIONS,
RPM_BUILD_NCPUS, --detect <n>, nproc/os.cpu_count()
But the order in the code is 4, 5, 3, 2, 1
Not straightforward.
Again, how would I disagree? I have changed the order of invocation to
match the order of preference and in that process also changed
--detect=N to take precendence over all environment variables while
still preferring environment variables over other detectors. Hope that
makes sense.
Post by Ángel
Also, it is doing actions such as running external program nproc even
it if's going to be discarded later. (nproc is in an essential package,
I know, but still)
This is fixed as a result of your earlier remark.
Post by Ángel
Also, why would the user want to manually specify between nptoc and
os.cpu_count()?
While nproc is universally available on Debian, I anticipate that the
tool could be useful on non-Linux platforms and there os.cpu_count() may
be more portable. I mainly added it due to the low effort and to
demonstrate the ability to use different detection methods. The values I
see users pass to --detect are actual numbers or "cores".
Post by Ángel
I would unconditionally call nproc, with a fallback to os.cpu_count()
if that fails (I'm assuming nproc may be smarter than os.cpu_count(),
otherwise one could use cpu_count() always)
Indeed nproc kinda is smarter as it honours some environment variables
such as OMP_NUM_THREADS and OMP_THREAD_LIMIT. Not sure we need that
fallback.
Post by Ángel
parser = argparse.ArgumentParser()
parser.add_argument(
"--count",
action="store",
default=None,
metavar="NUMBER",
help="supply a processor count",
)
Is there a reason to rename it from --detect to --count? The advantage
of reusing --detect I see is that you cannot reasonably combine those
two options and --detect=9 intuitively makes sufficient sense to me.
Post by Ángel
args = parser.parse_args()
guess = None
guess = positive_integer(args.count)
parser.error("invalid argument to --count")
guess = guess or guess_from_environment("CMAKE_BUILD_PARALLEL_LEVEL")
guess = guess or guess_deb_build_parallel()
guess = guess or guess_from_environment("RPM_BUILD_NCPUS")
guess = guess_nproc()
guess = guess or guess_python()
I see three main proposal in this code:
* Order of preference reflects order of code. (implemented)
* A given count overrides all other detection mechanisms. (implemented)
* Fallback from nproc to os.cpucount(). (not convinced)
Did I miss anything?

I'm more inclined to drop --detect=python entirely as you say its use is
fairly limited.
Post by Ángel
Additionally, the --ignore argument of nproc(1) might be of use for
this script as well.
Can you give a practical use case for this?

Helmut
Loading...