Discussion:
Dropping awk?
Add Reply
Russ Allbery
2025-04-17 18:30:01 UTC
Reply
Permalink
I noticed that Fedora 42 was released and their docker images lack a
'awk' tool. Debian trixie images ship with 'mawk' pre-installed right
now. While I'm not convinced the removal game is necessarily a good
one, I can see that it does have some advantages. Is it possible to
drop 'mawk' from the set of default tools in trixie? If not, what are
the blockers? What is the method to find out what the blockers are?
awk is in the essential set in Debian, so this would be a very substantial
amount of work. See the Pre-Depends in base-files, which is there to make
some awk implementation essential while still allowing the user to switch
between implementations.
--
Russ Allbery (***@debian.org) <https://www.eyrie.org/~eagle/>
Santiago Vila
2025-04-17 18:50:01 UTC
Reply
Permalink
Post by Russ Allbery
I noticed that Fedora 42 was released and their docker images lack a
'awk' tool. Debian trixie images ship with 'mawk' pre-installed right
now. While I'm not convinced the removal game is necessarily a good
one, I can see that it does have some advantages. Is it possible to
drop 'mawk' from the set of default tools in trixie? If not, what are
the blockers? What is the method to find out what the blockers are?
awk is in the essential set in Debian, so this would be a very substantial
amount of work. See the Pre-Depends in base-files, which is there to make
some awk implementation essential while still allowing the user to switch
between implementations.
Indeed. As James Troup once said, we made perl to be part of the essential set,
and not doing the same with good old awk would be criminal.

Installed size of mawk is 263 MB which is really small for today's standards.

Thanks.
Colin Watson
2025-04-17 19:10:01 UTC
Reply
Permalink
Post by Santiago Vila
Installed size of mawk is 263 MB which is really small for today's standards.
KB rather than MB, thankfully!
--
Colin Watson (he/him) [***@debian.org]
Santiago Vila
2025-04-17 19:50:01 UTC
Reply
Permalink
Post by Colin Watson
Post by Santiago Vila
Installed size of mawk is 263 MB which is really small for today's standards.
KB rather than MB, thankfully!
Big oops! I wonder how small they want images to be to
consider 263 KB unbearable.

Thanks.
Samuel Henrique
2025-04-17 19:10:01 UTC
Reply
Permalink
Post by Santiago Vila
Installed size of mawk is 263 MB which is really small for today's standards.
Isn't that bad for the Debian minimal images for containers?

I'm not too familiar with how we generate our container images but I can see
mawk there and Debian is used on most official container images of other
projects.

Cheers,
--
Samuel Henrique <samueloph>
Paul Tagliamonte
2025-04-17 20:00:02 UTC
Reply
Permalink
Post by Russ Allbery
awk is in the essential set in Debian, so this would be a very substantial
amount of work.
{Docker image comaintainer hat on}

This is right. More specifically -- the Debian docker images are
(intentionally) -- "just" `debootstrap --variant=minbase`.

Changing what packages are "pre-installed" with the Docker image is not
a negotiation that we wanted to have in isolation as the people who keep
the image current. Our goal was to have an image that wasn't unique (or
suprising) to a Debian project member -- rather, IMVHO, the package(s)
should be added or removed from the minbase set via our usual
conventions.

This has come up from time to time (in the form of some people asking to
'please install X', or 'why did Y go away') -- but the result and push
to sync these two ecosystems (debootstrap and the image) is something I
believe to be correct, and don't have any real intention of changing as
of right now.

If we want to drop something from the Docker image -- that's great! I'd
love that. It's just something we'd have to work through the usual
process of changing priority, deps, or what have you. Which -- I will
note -- benefits the whole operating system on all platforms, not just
one container image (this is the way).

paultag
--
⢀⣎⠟⠻⢶⣊⠀ Paul Tagliamonte <paultag>
⣟⠁⢠⠒⠀⣿⡁ https://people.debian.org/~paultag | https://pault.ag/
⢿⡄⠘⠷⠚⠋ Debian, the universal operating system.
⠈⠳⣄⠀⠀ 4096R / FEF2 EB20 16E6 A856 B98C E820 2DCD 6B5D E858 ADF3
Tianon Gravi
2025-04-17 22:10:01 UTC
Reply
Permalink
Post by Paul Tagliamonte
Post by Russ Allbery
awk is in the essential set in Debian, so this would be a very substantial
amount of work.
{Docker image comaintainer hat on}
This is right. More specifically -- the Debian docker images are
(intentionally) -- "just" `debootstrap --variant=minbase`.
Changing what packages are "pre-installed" with the Docker image is not a
negotiation that we wanted to have in isolation as the people who keep
the image current. Our goal was to have an image that wasn't unique (or
suprising) to a Debian project member -- rather, IMVHO, the package(s)
should be added or removed from the minbase set via our usual conventions.
This has come up from time to time (in the form of some people asking to
'please install X', or 'why did Y go away') -- but the result and push to
sync these two ecosystems (debootstrap and the image) is something I believe
to be correct, and don't have any real intention of changing as of right
now.
If we want to drop something from the Docker image -- that's great! I'd love
that. It's just something we'd have to work through the usual process of
changing priority, deps, or what have you. Which -- I will note -- benefits
the whole operating system on all platforms, not just one container image
(this is the way).
Co-sign all of that, with the addition of the following interesting
links:

- https://salsa.debian.org/debian/grow-your-ideas/-/issues/20
(where shrinking is discussed before / still)

- https://github.com/debuerreotype/docker-debian-artifacts/blob/949bf2c69b0888b62fe78dd45d02b74a7ddf64e2/trixie/rootfs.manifest
(the current set of packages in the "debian:trixie" image)

♥,
- Tianon
4096R / B42F 6819 007F 00F8 8E36 4FD4 036A 9C25 BF35 7DD4

(please keep me CC'd in any replies; I don't subscribe to -devel)
Sean Whitton
2025-04-18 00:50:01 UTC
Reply
Permalink
Hello,
Our goal was to have an image that wasn't unique (or suprising) to a
Debian project member -- rather, IMVHO, the package(s) should be added
or removed from the minbase set via our usual conventions.
This makes sense.

In this vein, I wish that our minimal install was POSIX sh-compatible.
It currently isn't, because m4 isn't included (and maybe some other
things). In fact, even on a regular Debian desktop install, m4 isn't
there unless you explicitly install it.

m4 is the only way POSIX.1-2017 defines to safely create a temporary
file (outside of writing and compiling a C program). I think the newer
POSIX standard has not improved on this particular point.

Removing awk from default installs would move us further away from that.

I don't know how common my view is, but I think making sure that all
POSIX sh-compatible scripts could be guaranteed to run without
compatibility hacks on any *BSD, macOS or Debian-based system would be
fantastically useful (at least FreeBSD and macOS already have the full
POSIX sh suite out-of-the-box; I would assume the other BSDs do too).

I think the implementation of this does not need to be "Essential: yes",
btw. Making it possible to be even smaller is fine too. I'm talking
about defaults -- and I think that includes default/official container
images.
--
Sean Whitton
Sean Whitton
2025-04-18 07:00:01 UTC
Reply
Permalink
Hello,
So, personally, I think getting mktemp(1) added to POSIX would be
better for portability in the long run anyway.
Eventually. POSIX.1-2017 is going to be the thing to target for a long
time, I think.

GNU m4 doesn't follow POSIX strictly, unfortunately.

See these workarounds for both the potential lack of m4 and the lack of
GNU m4 behaving POSIXly:

https://sources.debian.org/src/consfigurator/1.2.3-1/src/connection.lisp/#L305
--
Sean Whitton
Michael Stone
2025-04-18 12:20:01 UTC
Reply
Permalink
Post by Sean Whitton
So, personally, I think getting mktemp(1) added to POSIX would be
better for portability in the long run anyway.
Eventually. POSIX.1-2017 is going to be the thing to target for a long
time, I think.
I think POSIX is mostly a relic, and not worth worrying about except as
one of many inputs. Too many mistakes were made too early on, and it's
just too late to get everyone to agree on a common standard because real
world implementations diverged in too many ways. If someone wants to
make a program that works reliably across platforms sh isn't the right
tool in 2025. (And I say that as someone who quotes POSIX regularly: it
has value for things like choosing amongst a set of possible
implementations, but not for making assumptions about what will work in
the real world.)
Post by Sean Whitton
GNU m4 doesn't follow POSIX strictly, unfortunately.
Very few things do. POSIX itself has been trying harder to reflect
reality in areas where nobody wanted to follow the standard, but then
you're left with the problem that there's no straightforward way to
discover which version of the standard a particular tool is using.
Post by Sean Whitton
See these workarounds for both the potential lack of m4 and the lack of
I'm curious what modern platform doesn't have mktemp; is this more than
an academic question?
Sean Whitton
2025-04-19 12:10:01 UTC
Reply
Permalink
Hello,
Post by Sean Whitton
So, personally, I think getting mktemp(1) added to POSIX would be
better for portability in the long run anyway.
Eventually. POSIX.1-2017 is going to be the thing to target for a long
time, I think.
I think POSIX is mostly a relic, and not worth worrying about except as one of
many inputs. Too many mistakes were made too early on, and it's just too late
to get everyone to agree on a common standard because real world
implementations diverged in too many ways. If someone wants to make a program
that works reliably across platforms sh isn't the right tool in 2025. (And I
say that as someone who quotes POSIX regularly: it has value for things like
choosing amongst a set of possible implementations, but not for making
assumptions about what will work in the real world.)
I have interpreted scripts that I want to run on any FreeBSD and Debian
machine, because they are part of my OS bootstrapping. What else is
there than POSIX sh for this? Therefore, it's still relevant.
I'm curious what modern platform doesn't have mktemp; is this more than an
academic question?
I don't know. There are other things that you want awk for if you are
doing pure POSIX sh scripting; mkstemp is just an example.
--
Sean Whitton
Simon Josefsson
2025-04-19 13:40:01 UTC
Reply
Permalink
Post by Sean Whitton
Hello,
Post by Sean Whitton
So, personally, I think getting mktemp(1) added to POSIX would be
better for portability in the long run anyway.
Eventually. POSIX.1-2017 is going to be the thing to target for a long
time, I think.
I think POSIX is mostly a relic, and not worth worrying about except as one of
many inputs. Too many mistakes were made too early on, and it's just too late
to get everyone to agree on a common standard because real world
implementations diverged in too many ways. If someone wants to make a program
that works reliably across platforms sh isn't the right tool in 2025. (And I
say that as someone who quotes POSIX regularly: it has value for things like
choosing amongst a set of possible implementations, but not for making
assumptions about what will work in the real world.)
I have interpreted scripts that I want to run on any FreeBSD and Debian
machine, because they are part of my OS bootstrapping. What else is
there than POSIX sh for this? Therefore, it's still relevant.
I think some reasonable subset of POSIX sh is all you can/should assume
these days, everything else needs to be documented and installed as
dependencies. Even (what used to be) common tools like awk, cmp, diff,
join have disappeared from various distributions. Guix proved that a
/bin/sh-only approach is possible and usable.

I have mixed feelings about this minimization pattern -- it is often
combined with replacing copyleft software with non-copyleft
implementations (GPL -> LGPL/MIT) -- but I can't deny that I find
minimal containers really useful.

/Simon
Michael Stone
2025-04-19 13:50:01 UTC
Reply
Permalink
Post by Sean Whitton
I have interpreted scripts that I want to run on any FreeBSD and Debian
machine, because they are part of my OS bootstrapping. What else is
there than POSIX sh for this? Therefore, it's still relevant.
With that requirement, what you really want to know is how to write a
script that works on FreeBSD and Debian--which POSIX can't tell you.
(Neither of those is POSIX certified or fully compliant.) POSIX might be
a starting point, but you'll have to read man pages and figure out the
discrepencies. If you're stuck doing that anyway, I seriously question
the value of artificially limiting yourself to what unix tools did 30 or
40 years ago--newer tools or options often let you accomplish tasks much
more efficiently. Maybe it would be worth avoiding those if POSIX really
did let you write once and run anywhere...but it doesn't.
Sean Whitton
2025-04-19 15:00:02 UTC
Reply
Permalink
Hello,
Post by Sean Whitton
I have interpreted scripts that I want to run on any FreeBSD and Debian
machine, because they are part of my OS bootstrapping. What else is
there than POSIX sh for this? Therefore, it's still relevant.
With that requirement, what you really want to know is how to write a script
that works on FreeBSD and Debian--which POSIX can't tell you. (Neither of
those is POSIX certified or fully compliant.) POSIX might be a starting point,
but you'll have to read man pages and figure out the discrepencies. If you're
stuck doing that anyway, I seriously question the value of artificially
limiting yourself to what unix tools did 30 or 40 years ago--newer tools or
options often let you accomplish tasks much more efficiently. Maybe it would
be worth avoiding those if POSIX really did let you write once and run
anywhere...but it doesn't.
This just hasn't been my experience. You don't need perfect
compatibility (or certification). By restricting myself to the POSIX
specifications of sh, awk, find, grep and sed, I've profitably written
several non-trivial programs that work correctly on any FreeBSD install
and any Debian install that wasn't specifically engineered to be
minimal.
--
Sean Whitton
Michael Stone
2025-04-19 15:50:01 UTC
Reply
Permalink
Post by Sean Whitton
This just hasn't been my experience. You don't need perfect
compatibility (or certification). By restricting myself to the POSIX
specifications of sh, awk, find, grep and sed, I've profitably written
several non-trivial programs that work correctly on any FreeBSD install
and any Debian install that wasn't specifically engineered to be
minimal.
You happened to pick two of the most compatible OSs--it's not hard to be
portable between linux & freebsd *by accident* as there's a long history
of cross-pollination between them. (E.g., coreutils routinely looks to
see what parameters freebsd used when implementing a new feature.)
Expand the problem set to include running on SunOS and AIX and OSX and
QNX and ... and the problem becomes much harder. But if you don't care
about all those oddballs, why limit yourself to POSIX--whose point was
to try to enable that degree of cross-platform interoperability? Stick
to the intersection between linux + freebsd and you instantly get access
to all kinds of wonderful modern things like mktemp without having to
wait for POSIX to tell you it's ok. Conversely, if you expect POSIX from
debian you're going to be disappointed now and then. E.g., POSIX gave up
on trying to unify all the incompatible versions of tar/cpio and created
a new standard archive utility named pax. Which works fine on many
non-certified but POSIX-curious OSs like FreeBSD, OSX, OpenIndiana, etc
etc, but you won't find it on a standard debian install. It's just one
of those things where regardless of what standard you are writing to,
you still need to check to see how reality matches the standard.
Sean Whitton
2025-04-23 01:50:01 UTC
Reply
Permalink
Hello,
Post by Michael Stone
You happened to pick two of the most compatible OSs--it's not hard to
be portable between linux & freebsd *by accident* as there's a long
history of cross-pollination between them. (E.g., coreutils routinely
looks to see what parameters freebsd used when implementing a new
feature.)
Fair point, though, I also use these programs on some NetBSD systems,
and my experience was that until I started paying attention to POSIX I
was continually using various features that weren't present over there.
Post by Michael Stone
Expand the problem set to include running on SunOS and AIX and OSX and
QNX and ... and the problem becomes much harder. But if you don't care
about all those oddballs, why limit yourself to POSIX--whose point was
to try to enable that degree of cross-platform interoperability?
[...]
It's just one of those things where regardless of what standard you
are writing to, you still need to check to see how reality matches the
standard.
The nature of these particular programs is that I might want to be able
to run them on those platforms in the future.
If I've already stuck to POSIX when writing them, then porting them to
those more difficult platforms should be much easier.

I'm a big fan of not putting up barriers to making programs portable in
the future, even if you haven't gone to the effort of really making sure
they're portable yet.

Also, I have to admit, I found it a lot of fun trying to figure out how
to make these programd performant enough with only POSIX facilities.
--
Sean Whitton
Josh Triplett
2025-04-17 20:50:01 UTC
Reply
Permalink
Debian trixie images ship with 'mawk' pre-installed right now. While
I'm not convinced the removal game is necessarily a good one, I can
see that it does have some advantages. Is it possible to drop 'mawk'
from the set of default tools in trixie? If not, what are the
blockers? What is the method to find out what the blockers are?
I would *love* to see the Essential set reduced. But I think this is
combining a couple of steps, and we'd do better to separate those steps.

One is "should we make dependencies on awk explicit, rather than having
them be implicit and undocumented because awk is Essential".
The other is "should we reduce dependencies on awk".

The latter may or may not happen in any individual case, but I think the
former would have a lot of value independently. And with the former
done, we'd have the opportunity to *consider* the latter on a
case-by-case basis, with rationales like "if packages A and B didn't use
awk, then we'd simplify bootstrapping", or "if packages B and C didn't
use awk, it'd be possible for XYZ useful class of minimal
systems/containers/VMs to not need it installed".

Given some amount of agreement that it had value, and that the downsides
were low, we could consider *starting* to list dependencies on awk (by
way of virtual packages to allow selecting implementation) explicitly,
rather than leaving them implicit via Essential. A quick look at
/var/lib/dpkg/info suggests that not *that* many maintainer scripts use
it even on a full desktop system, and a look at /usr/bin and /usr/sbin
suggests that while there are various things using it, they tend to come
in a few broad categories (e.g. developer-oriented scripts) and *mostly*
be higher in the stack (e.g. mostly not essential things themselves). (A
notable exception is tzselect, which makes extensive use of awk, but
while that's in the essential package libc-bin, it does not itself seem
like an essential tool and could potentially be in a higher-level
package itself.)

Based on that, it seems like it would not be *especially* hard to
*declare dependencies* on awk, which would not in any way a commitment
towards *systematically eliminating* those dependencies. Having those
dependencies declared would then make it feasible to consider avoiding
it in *some* especially valuable cases, and conversely would allow folks
building container/VM/embedded images to know when they actually need
it.

In general, I think this is roughly the right approach for any proposed
work on the Essential set, with the first step being to declare
dependencies explicitly.

- Josh Triplett
Adrian Bunk
2025-04-20 12:20:01 UTC
Reply
Permalink
Post by Josh Triplett
Debian trixie images ship with 'mawk' pre-installed right now. While
I'm not convinced the removal game is necessarily a good one, I can
see that it does have some advantages. Is it possible to drop 'mawk'
from the set of default tools in trixie? If not, what are the
blockers? What is the method to find out what the blockers are?
I would *love* to see the Essential set reduced. But I think this is
combining a couple of steps, and we'd do better to separate those steps.
One is "should we make dependencies on awk explicit, rather than having
them be implicit and undocumented because awk is Essential".
The other is "should we reduce dependencies on awk".
The latter may or may not happen in any individual case, but I think the
former would have a lot of value independently.
The former without the latter is just a lot of wasted work without any
benefits.
Post by Josh Triplett
And with the former
done, we'd have the opportunity to *consider* the latter on a
case-by-case basis, with rationales like "if packages A and B didn't use
awk, then we'd simplify bootstrapping", or "if packages B and C didn't
use awk, it'd be possible for XYZ useful class of minimal
systems/containers/VMs to not need it installed".
...
How do you ensure there are no missing dependencies throughout the
archive when this is a situation that cannot happen in practice?

In practice you can do the former only if you have tooling that provides
the latter, contrary to your claim that the former would help with the
latter.
Post by Josh Triplett
In general, I think this is roughly the right approach for any proposed
work on the Essential set, with the first step being to declare
dependencies explicitly.
It's just a waste of time, especially if the end goal is not defined
from the start.

If someone wants to remove awk from the essential set,
then replacing the far larger sed would also be desirable.

Larry Wall delayed the initial release of Perl until the a2p and s2p
converters for converting from AWK and sed to Perl were completed.
Instead of just annotating dependencies, usage of AWK and sed in the
relevant package set could immediately be replaced with usage of the
essential Perl.

Unless someone wants to get rid of Perl in the essential set,
which is 10 times the size of AWK and sed combined.

The sane starting point would be discussing which tools should be part
of the (transitive) essential set.

Otherwise you might end up with one group of people converting from
AWK to Perl, while another group of people converts from Perl to AWK.
Post by Josh Triplett
- Josh Triplett
cu
Adrian

BTW: Replacing mawk with original-awk in installs might be a low-hanging
fruit to save 100kB in forky, having original-awk as only AWK
variant installed is already a supported configuration.
Santiago Vila
2025-04-20 13:10:01 UTC
Reply
Permalink
Post by Adrian Bunk
The former without the latter is just a lot of wasted work without any
benefits.
I also agree that removing awk in the normal Debian distribution is a waste
of time.

If somebody wants to create a minimal container based on Debian, static
in nature, which does not need to be upgraded using apt, etc. there
are a lot of things that could be done before dropping awk, and those
things do not need to propagate to the normal distribution.
Post by Adrian Bunk
BTW: Replacing mawk with original-awk in installs might be a low-hanging
fruit to save 100kB in forky, having original-awk as only AWK
variant installed is already a supported configuration.
Yes, I was going to suggest that to those for which 263KB in a container
is too much.

Thanks.
Josh Triplett
2025-04-20 17:30:01 UTC
Reply
Permalink
Post by Adrian Bunk
Post by Josh Triplett
Debian trixie images ship with 'mawk' pre-installed right now. While
I'm not convinced the removal game is necessarily a good one, I can
see that it does have some advantages. Is it possible to drop 'mawk'
from the set of default tools in trixie? If not, what are the
blockers? What is the method to find out what the blockers are?
I would *love* to see the Essential set reduced. But I think this is
combining a couple of steps, and we'd do better to separate those steps.
One is "should we make dependencies on awk explicit, rather than having
them be implicit and undocumented because awk is Essential".
The other is "should we reduce dependencies on awk".
The latter may or may not happen in any individual case, but I think the
former would have a lot of value independently.
The former without the latter is just a lot of wasted work without any
benefits.
[...]
Post by Adrian Bunk
Post by Josh Triplett
In general, I think this is roughly the right approach for any proposed
work on the Essential set, with the first step being to declare
dependencies explicitly.
It's just a waste of time, especially if the end goal is not defined
from the start.
What I'm suggesting here is that if every individual package that needs
awk has a Depends on it (via a package that allows switching
implementations), rather than relying on Essential, then it becomes
possible to make incremental progress, and that incremental progress
benefits people who are willing to carefully remove some of what Debian
normally always has installed packages.

If you're already building the kind of container that will want to
remove dpkg and apt (among other things) when you're done building it,
it'd be nice to have dependency metadata that helps you figure out what
is and isn't still used. That's useful even if not everything eliminates
its dependencies yet.

By way of example: e2fsprogs uses awk (in e2scrub), but many container
builders will remove that package (or never install it in the first
place), so it's not particularly important to do anything about its
dependency on awk, *other than declaring it*. If other, harder-to-remove
packages manage to stop using awk, then awk becomes removable, in a less
error-prone way.
Post by Adrian Bunk
If someone wants to remove awk from the essential set,
then replacing the far larger sed would also be desirable.
[...]
Post by Adrian Bunk
Unless someone wants to get rid of Perl in the essential set,
which is 10 times the size of AWK and sed combined.
That would be ideal, yeah.
Post by Adrian Bunk
The sane starting point would be discussing which tools should be part
of the (transitive) essential set.
In an ideal world, rather than trying to pick which of sed/awk/perl/etc
are used in the core tools of Debian, one path would be to turn many
such core tools into compiled programs.

Another would be to identify tools that are only used *during
installation or upgrade* but are never needed by the running system
(e.g. `update-initramfs`), and make it easier to remove those. There'd
be no particular need to prioritize removing the usage of
awk/sed/perl/python/etc from such tools.
Andrey Rakhmatullin
2025-04-20 18:30:01 UTC
Reply
Permalink
Post by Josh Triplett
Post by Adrian Bunk
Post by Josh Triplett
Debian trixie images ship with 'mawk' pre-installed right now. While
I'm not convinced the removal game is necessarily a good one, I can
see that it does have some advantages. Is it possible to drop 'mawk'
from the set of default tools in trixie? If not, what are the
blockers? What is the method to find out what the blockers are?
I would *love* to see the Essential set reduced. But I think this is
combining a couple of steps, and we'd do better to separate those steps.
One is "should we make dependencies on awk explicit, rather than having
them be implicit and undocumented because awk is Essential".
The other is "should we reduce dependencies on awk".
The latter may or may not happen in any individual case, but I think the
former would have a lot of value independently.
The former without the latter is just a lot of wasted work without any
benefits.
[...]
Post by Adrian Bunk
Post by Josh Triplett
In general, I think this is roughly the right approach for any proposed
work on the Essential set, with the first step being to declare
dependencies explicitly.
It's just a waste of time, especially if the end goal is not defined
from the start.
What I'm suggesting here is that if every individual package that needs
awk has a Depends on it (via a package that allows switching
implementations), rather than relying on Essential, then it becomes
possible to make incremental progress, and that incremental progress
benefits people who are willing to carefully remove some of what Debian
normally always has installed packages.
Should we start declaring deps on all essential packages explicitly?
--
WBR, wRAR
Josh Triplett
2025-04-20 19:30:01 UTC
Reply
Permalink
Post by Andrey Rakhmatullin
Should we start declaring deps on all essential packages explicitly?
I personally think that would be a good idea, though I'm not currently
trying to make the case for that across the board here. Right now, I'm
trying to make the case that that's a good first step for any packages
people might want to work on making optional. I doubt that anyone is
likely to make coreutils non-essential anytime soon (though the ability
to replace it with smaller alternatives would be nice), but on the other
hand, tools like perl-base, awk, and sed would be a lot more
feasible, as well as some higher-level things like ncurses-bin and
ncurses-base (not typically needed for systems that don't support
logins).
1) Shrinking the Packages file. This is something that good compression
handles quite well, and it's not obvious that it provides much of a
win. And if we *really* care about shrinking the Packages file,
there's a lot of low-hanging fruit there: MD5sum, tags
(https://lists.debian.org/debian-devel/2023/11/msg00226.html), and
several others. Eliminating MD5sum alone would save more than 1MB of
*compressed* size from the currently ~8MB Packages.xz. And the names
of common packages are *much* more compressible than MD5sums. :)

2) Maintenance: missing dependencies are hard to track and test. But
these days, we have much more automatic testing infrastructure, much
more install/upgrade/removal testing infrastructure, and many other
things. And note, in particular, that there's nothing stopping us
from adding some of these packages to *Build-Essential* at the same
time we dropped them from Essential, for convenience.
Andreas Metzler
2025-04-21 05:20:01 UTC
Reply
Permalink
Post by Josh Triplett
Post by Andrey Rakhmatullin
Should we start declaring deps on all essential packages explicitly?
I personally think that would be a good idea, though I'm not currently
trying to make the case for that across the board here. Right now, I'm
[...]
Post by Josh Triplett
1) Shrinking the Packages file. This is something that good compression
handles quite well, and it's not obvious that it provides much of a
win. And if we *really* care about shrinking the Packages file,
there's a lot of low-hanging fruit there: MD5sum, tags
(https://lists.debian.org/debian-devel/2023/11/msg00226.html), and
several others. Eliminating MD5sum alone would save more than 1MB of
*compressed* size from the currently ~8MB Packages.xz. And the names
of common packages are *much* more compressible than MD5sums. :)
2) Maintenance: missing dependencies are hard to track and test. But
these days, we have much more automatic testing infrastructure, much
more install/upgrade/removal testing infrastructure, and many other
things. And note, in particular, that there's nothing stopping us
from adding some of these packages to *Build-Essential* at the same
time we dropped them from Essential, for convenience.
This has already come up, but I think it is worth noting more
prominently. There is a third important use case:

3) Essential packages can be used in preinst and postrm
maintainer-scripts. (The former usage can be made explicit mit
Pre-Depends, the latter would need to be dropped if a command lost
Essential status.)

cu Andreas
--
`What a good friend you are to him, Dr. Maturin. His other friends are
so grateful to you.'
`I sew his ears on from time to time, sure'
G. Branden Robinson
2025-04-20 21:50:01 UTC
Reply
Permalink
Post by Andrey Rakhmatullin
Post by Josh Triplett
What I'm suggesting here is that if every individual package that
needs awk has a Depends on it (via a package that allows switching
implementations), rather than relying on Essential, then it becomes
possible to make incremental progress, and that incremental progress
benefits people who are willing to carefully remove some of what
Debian normally always has installed packages.
Should we start declaring deps on all essential packages explicitly?
I think that's a good idea. "Explicit is better than implicit," as the
Zen of Python puts it.[1]

Factual statements about one's run-time dependencies should be as
decoupled from the details of the set of "Essential" packages as
possible. One reason is that the identities of the people making these
decisions are disjoint. Often a package maintainer lacks this power;
except for dependencies they introduce through operation of their
maintainer scripts (or Debian add-ons), such run-time dependencies are
beyond their control.

By contrast, the population of the Essential set is up to...well, I'm
not sure who. Some vaguely defined intersection of the dpkg
maintainer(s), the release managers, and installer team, I guess.

In principle, the all of the developers collectively (and interested
discussants) are responsible for such decisions. Unfortunately,
decisions in Debian are sometimes not made by those whom we claim.

"You must not tag any packages essential before this has been discussed
on the debian-devel mailing list and a consensus about doing that has
been reached." -- Debian Policy Manual, §3.8[2]

That implies to me that a package can be taken _out_ of the essential
set unilaterally by the package maintainer(s) of a package that's in it,
but because of the status quo of being able to depend on an essential
package without declaring that fact, in practice that probably wouldn't
work well, and we should update the Policy Manual to require discussion
of the dropping of such a "tag" as well.

...for which achievement of the goal you propose is a prerequisite.

One of the current Policy Manual editors might opine.

Regards,
Branden

[1] https://peps.python.org/pep-0020/
[2] https://www.debian.org/doc/debian-policy/ch-binary.html#essential-packages
Adrian Bunk
2025-04-20 23:00:02 UTC
Reply
Permalink
Post by G. Branden Robinson
Post by Andrey Rakhmatullin
Post by Josh Triplett
What I'm suggesting here is that if every individual package that
needs awk has a Depends on it (via a package that allows switching
implementations), rather than relying on Essential, then it becomes
possible to make incremental progress, and that incremental progress
benefits people who are willing to carefully remove some of what
Debian normally always has installed packages.
Should we start declaring deps on all essential packages explicitly?
I think that's a good idea. "Explicit is better than implicit," as the
Zen of Python puts it.[1]
Factual statements about one's run-time dependencies should be as
decoupled from the details of the set of "Essential" packages as
possible. One reason is that the identities of the people making these
decisions are disjoint. Often a package maintainer lacks this power;
except for dependencies they introduce through operation of their
maintainer scripts (or Debian add-ons), such run-time dependencies are
beyond their control.
...
While this might sound good in theory, in practice it would be horrible.

As an example, libc6 has a preinst script that calls dpkg, sed,
grep and rm.

Making these dependencies explicit would be something like
Pre-Depends: sh, dpkg, sed, grep, coreutils

I would expect that such Pre-Depends cycles between essential packages
and libc6 will result in broken systems during upgrades.


And then there's the normal time-waste like "the package ships
a bash-completion file that uses awk, grep, sed and sort - that's
dependencies on four essential packages".
Post by G. Branden Robinson
Regards,
Branden
...
cu
Adrian
G. Branden Robinson
2025-04-21 00:10:02 UTC
Reply
Permalink
Hi Adrian,
Post by Adrian Bunk
Post by G. Branden Robinson
Factual statements about one's run-time dependencies should be as
decoupled from the details of the set of "Essential" packages as
possible. One reason is that the identities of the people making
these decisions are disjoint. Often a package maintainer lacks this
power; except for dependencies they introduce through operation of
their maintainer scripts (or Debian add-ons), such run-time
dependencies are beyond their control.
...
While this might sound good in theory, in practice it would be
horrible.
I'm not convinced...yet.
Post by Adrian Bunk
As an example, libc6 has a preinst script that calls dpkg, sed,
grep and rm.
A maintainer script that calls `dpkg`, I think, violates the Principle
of Least Astonishment.[1]
Post by Adrian Bunk
Making these dependencies explicit would be something like
Pre-Depends: sh, dpkg, sed, grep, coreutils
I would expect that such Pre-Depends cycles between essential packages
and libc6 will result in broken systems during upgrades.
My impression from watching countless upgrades over the years is that,
when any Essential package is available for upgrade, upgrades performed
by apt launch a full, discrete run of dpkg for each.[3] So the only way
any such cycles could exist is if the Pre-Depends were versioned and the
versionings themselves produced the cycle, in which case there likely
_really is_ a breakage problem: one package or the other needs to delay
its aggressive employment of a newly available feature.
Post by Adrian Bunk
And then there's the normal time-waste like "the package ships
a bash-completion file that uses awk, grep, sed and sort - that's
dependencies on four essential packages".
Hmm. In my opinion bash-completion scripts should produce
"Recommends"-strength dependencies at most; not everybody's going to be
using bash as their interactive shell, and completion matters only for
interactive shells.

That said, I haven't looked but would not be surprised if
bash-completion scripts make extremely aggressive assumptions about
what's available on the system. That is, they seldom if ever look
before the leap, and happily assume the presence of plenty of things
that _aren't_ in Essential, or even "Priority: required" packages.

But that's just a hunch. I could be wrong.

Regards,
Branden

[1] ...but it happens more often than it could because a lot of
maintainer scripts need `dpkg --compare-versions`--functionality
that ideally would be put someplace that sees less vigorous
development, like the debianutils package. Debian's version
comparison algorithm is pretty stable and has served us well for 30
years, with only one change in that interval I'm aware of, that
being the logic to support `~` for "pre-release" semantics. This
algorithm has overcome significant external indifference and/or
hostility to Debian and been embraced elsewhere.[2]

In general, the package manager does not need to be, and should not
be, re-entrant.

[2] https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/filevercmp.c#n81

[3] ...by which I mean all stages of §§6.6 and 6.7 of the Debian Policy
Manual.

https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html#details-of-unpack-phase-of-installation-or-upgrade
https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html#details-of-configuration
Adrian Bunk
2025-04-21 02:30:01 UTC
Reply
Permalink
Post by G. Branden Robinson
Hi Adrian,
Hi Branden,
Post by G. Branden Robinson
...
My impression from watching countless upgrades over the years is that,
when any Essential package is available for upgrade, upgrades performed
by apt launch a full, discrete run of dpkg for each.[3] So the only way
any such cycles could exist is if the Pre-Depends were versioned and the
versionings themselves produced the cycle, in which case there likely
_really is_ a breakage problem: one package or the other needs to delay
its aggressive employment of a newly available feature.
awk is a virtual essential package, it is theoretically possible
that dependency resolution during an upgrade deinstalls the previously
installed awk implementation and installs a different one instead.
Post by G. Branden Robinson
Post by Adrian Bunk
And then there's the normal time-waste like "the package ships
a bash-completion file that uses awk, grep, sed and sort - that's
dependencies on four essential packages".
Hmm. In my opinion bash-completion scripts should produce
"Recommends"-strength dependencies at most; not everybody's going to be
using bash as their interactive shell, and completion matters only for
interactive shells.
You want to invent a huge amount of error-prone manual work for
contributors that also adds completely new problems like this?

You want all contributors to know that rm and sort are in coreutils, but
grep and sed have own packages, and then add dependencies based on that?

A program using system() needs /bin/sh, which implies it would need a
dependency on an essential (virtual?) shell package.

The amount of dependencies in the archive is already a resource issue
for dependency resolution.

All this is just an incredibly stupid idea.

The essential set exists so that users and developers always have all
the basic tools of a Linux system available.

Debian is a binary distribution with nearly 40k source packages,
and there are use cases where the only sane answer is to use a
different distribution.

If anyone whould actually care about the size of awk on small images,
the 8 year old #861343 would be a low-hanging fruit to get a third of
that removed in small images without requiring any archive-wide changes.
Post by G. Branden Robinson
That said, I haven't looked but would not be surprised if
bash-completion scripts make extremely aggressive assumptions about
what's available on the system. That is, they seldom if ever look
before the leap, and happily assume the presence of plenty of things
that _aren't_ in Essential, or even "Priority: required" packages.
But that's just a hunch. I could be wrong.
Missing dependencies (which are rare) are reported fast by users since
they break the bash completion.
Post by G. Branden Robinson
Regards,
Branden
...
cu
Adrian
Marc Haber
2025-04-21 07:30:01 UTC
Reply
Permalink
Post by Adrian Bunk
Post by G. Branden Robinson
Factual statements about one's run-time dependencies should be as
decoupled from the details of the set of "Essential" packages as
possible. One reason is that the identities of the people making these
decisions are disjoint. Often a package maintainer lacks this power;
except for dependencies they introduce through operation of their
maintainer scripts (or Debian add-ons), such run-time dependencies are
beyond their control.
...
While this might sound good in theory, in practice it would be horrible.
As an example, libc6 has a preinst script that calls dpkg, sed,
grep and rm.
Making these dependencies explicit would be something like
Pre-Depends: sh, dpkg, sed, grep, coreutils
I would expect that such Pre-Depends cycles between essential packages
and libc6 will result in broken systems during upgrades.
I have always been less than a fan of not being allowed to declare
dependency on bash or coreutils, but a (non-logical) tradeoff would be
allowing to declare such dependencies, and liberally removing (or
demoting) them if they cause a cycle. I don't expect that to happen
too often.
Post by Adrian Bunk
And then there's the normal time-waste like "the package ships
a bash-completion file that uses awk, grep, sed and sort - that's
dependencies on four essential packages".
I'd rather do that than having to spend hours updating year numbers in
machine readable debian/copyright.

Greetings
Marc
--
----------------------------------------------------------------------------
Marc Haber | " Questions are the | Mailadresse im Header
Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402
Jonathan Dowland
2025-04-22 08:10:01 UTC
Reply
Permalink
Post by Adrian Bunk
While this might sound good in theory, in practice it would be
horrible.
As an example, libc6
It's a good example of something that could be a problem and would need
careful attention, but an essential library, and in particular libc6,
doesn't likely reflect the experience that we'd have for the vast
majority of packages.
--
Please do not CC me for listmail.

👱🏻 Jonathan Dowland
✎ ***@debian.org
🔗 https://jmtd.net
Josh Triplett
2025-04-21 00:50:02 UTC
Reply
Permalink
Post by G. Branden Robinson
Post by Andrey Rakhmatullin
Post by Josh Triplett
What I'm suggesting here is that if every individual package that
needs awk has a Depends on it (via a package that allows switching
implementations), rather than relying on Essential, then it becomes
possible to make incremental progress, and that incremental progress
benefits people who are willing to carefully remove some of what
Debian normally always has installed packages.
Should we start declaring deps on all essential packages explicitly?
I think that's a good idea. "Explicit is better than implicit," as the
Zen of Python puts it.[1]
Agreed, but...
Post by G. Branden Robinson
Factual statements about one's run-time dependencies should be as
decoupled from the details of the set of "Essential" packages as
possible.
[...]
Post by G. Branden Robinson
By contrast, the population of the Essential set is up to...well, I'm
not sure who. Some vaguely defined intersection of the dpkg
maintainer(s), the release managers, and installer team, I guess.
In principle, the all of the developers collectively (and interested
discussants) are responsible for such decisions. Unfortunately,
decisions in Debian are sometimes not made by those whom we claim.
"You must not tag any packages essential before this has been discussed
on the debian-devel mailing list and a consensus about doing that has
been reached." -- Debian Policy Manual, §3.8[2]
That implies to me that a package can be taken _out_ of the essential
set unilaterally by the package maintainer(s) of a package that's in it,
but because of the status quo of being able to depend on an essential
package without declaring that fact, in practice that probably wouldn't
work well, and we should update the Policy Manual to require discussion
of the dropping of such a "tag" as well.
I think that's a bug in Policy as written, rather than a bug in
practice. Historical practice has definitely been to discuss such
removals (extensively).

We should have a well-defined process for this, that includes discussion
transition plans (involving the introduction of Depends as needed
first), and similar.
G. Branden Robinson
2025-04-21 01:10:01 UTC
Reply
Permalink
Hi Josh (it's been a long time since XFree86 packaging days!),
Post by Josh Triplett
Post by G. Branden Robinson
"You must not tag any packages essential before this has been
discussed on the debian-devel mailing list and a consensus about
doing that has been reached." -- Debian Policy Manual, §3.8[2]
That implies to me that a package can be taken _out_ of the
essential set unilaterally by the package maintainer(s) of a package
that's in it, but because of the status quo of being able to depend
on an essential package without declaring that fact, in practice
that probably wouldn't work well, and we should update the Policy
Manual to require discussion of the dropping of such a "tag" as
well.
I think that's a bug in Policy as written, rather than a bug in
practice. Historical practice has definitely been to discuss such
removals (extensively).
We should have a well-defined process for this, that includes
discussion transition plans (involving the introduction of Depends as
needed first), and similar.
I agree that the Policy Manual should be fixed to document the
deliberative process we actually use in this case as well as the one it
already contemplates.

But I want to reëmphasize that explicit Dependency declarations would
make it easier to see and reason about such things.

Regards,
Branden
Adrian Bunk
2025-04-20 20:20:01 UTC
Reply
Permalink
Post by Josh Triplett
Post by Adrian Bunk
Post by Josh Triplett
Debian trixie images ship with 'mawk' pre-installed right now. While
I'm not convinced the removal game is necessarily a good one, I can
see that it does have some advantages. Is it possible to drop 'mawk'
from the set of default tools in trixie? If not, what are the
blockers? What is the method to find out what the blockers are?
I would *love* to see the Essential set reduced. But I think this is
combining a couple of steps, and we'd do better to separate those steps.
One is "should we make dependencies on awk explicit, rather than having
them be implicit and undocumented because awk is Essential".
The other is "should we reduce dependencies on awk".
The latter may or may not happen in any individual case, but I think the
former would have a lot of value independently.
The former without the latter is just a lot of wasted work without any
benefits.
[...]
Post by Adrian Bunk
Post by Josh Triplett
In general, I think this is roughly the right approach for any proposed
work on the Essential set, with the first step being to declare
dependencies explicitly.
It's just a waste of time, especially if the end goal is not defined
from the start.
What I'm suggesting here is that if every individual package that needs
awk has a Depends on it (via a package that allows switching
implementations), rather than relying on Essential, then it becomes
possible to make incremental progress, and that incremental progress
benefits people who are willing to carefully remove some of what Debian
normally always has installed packages.
If you're already building the kind of container that will want to
remove dpkg and apt (among other things) when you're done building it,
it'd be nice to have dependency metadata that helps you figure out what
is and isn't still used. That's useful even if not everything eliminates
its dependencies yet.
If you have no need to install security updates and want a really small
system, then Debian is simply not the right choice.

With embedded distributions a whole system of bootloader, kernel and
userspace easily fits on 16 MB flash, even when including bloated stuff
like glibc and systemd, with plenty of space left for the application
that should run on the device.
You can't do that with Debian.

Being able to remove libsystemd0 would save multiple times of what
you would save by removing AWK, your time would be better spent on
requesting the creation of packages like util-linux-nosystemd instead
of making dependencies on AWK explicit.

Trying to save < 1% space by removing AWK in situations where
Debian is anyway the wrong choice does not make much sense.
Post by Josh Triplett
By way of example: e2fsprogs uses awk (in e2scrub), but many container
builders will remove that package (or never install it in the first
place), so it's not particularly important to do anything about its
dependency on awk, *other than declaring it*. If other, harder-to-remove
packages manage to stop using awk, then awk becomes removable, in a less
error-prone way.
...
"harder-to-remove packages manage to stop using awk" is such awfully
passive language.

Let's rather talk about what Debian should officially support,
and how Josh Triplett plans to implement it.

Trying to officially support removing essential packages sounds to me
like a maintainance nightmare with little benefit, you have to do some
explaining how you will keep this maintainable when you do it.

cu
Adrian
Marco d'Itri
2025-04-20 20:40:02 UTC
Reply
Permalink
Post by Adrian Bunk
With embedded distributions a whole system of bootloader, kernel and
userspace easily fits on 16 MB flash, even when including bloated stuff
like glibc and systemd, with plenty of space left for the application
that should run on the device.
You can't do that with Debian.
No, because the goal is to be able to use the whole Debian packages
ecosystem.
--
ciao,
Marco
Josh Triplett
2025-04-21 01:10:01 UTC
Reply
Permalink
Post by Adrian Bunk
Post by Josh Triplett
Post by Adrian Bunk
Post by Josh Triplett
Debian trixie images ship with 'mawk' pre-installed right now. While
I'm not convinced the removal game is necessarily a good one, I can
see that it does have some advantages. Is it possible to drop 'mawk'
from the set of default tools in trixie? If not, what are the
blockers? What is the method to find out what the blockers are?
I would *love* to see the Essential set reduced. But I think this is
combining a couple of steps, and we'd do better to separate those steps.
One is "should we make dependencies on awk explicit, rather than having
them be implicit and undocumented because awk is Essential".
The other is "should we reduce dependencies on awk".
The latter may or may not happen in any individual case, but I think the
former would have a lot of value independently.
The former without the latter is just a lot of wasted work without any
benefits.
[...]
Post by Adrian Bunk
Post by Josh Triplett
In general, I think this is roughly the right approach for any proposed
work on the Essential set, with the first step being to declare
dependencies explicitly.
It's just a waste of time, especially if the end goal is not defined
from the start.
What I'm suggesting here is that if every individual package that needs
awk has a Depends on it (via a package that allows switching
implementations), rather than relying on Essential, then it becomes
possible to make incremental progress, and that incremental progress
benefits people who are willing to carefully remove some of what Debian
normally always has installed packages.
If you're already building the kind of container that will want to
remove dpkg and apt (among other things) when you're done building it,
it'd be nice to have dependency metadata that helps you figure out what
is and isn't still used. That's useful even if not everything eliminates
its dependencies yet.
If you have no need to install security updates
You said this elsewhere in the thread, but it's not correct there or
here: you absolutely *do* install security updates for a container, by
installing a new container with new versions of packages.
Post by Adrian Bunk
With embedded distributions a whole system of bootloader, kernel and
userspace
Containers typically don't have either a bootloader or a kernel, and
often don't have an init either.
Post by Adrian Bunk
Post by Josh Triplett
By way of example: e2fsprogs uses awk (in e2scrub), but many container
builders will remove that package (or never install it in the first
place), so it's not particularly important to do anything about its
dependency on awk, *other than declaring it*. If other, harder-to-remove
packages manage to stop using awk, then awk becomes removable, in a less
error-prone way.
...
"harder-to-remove packages manage to stop using awk" is such awfully
passive language.
Let's rather talk about what Debian should officially support,
and how Josh Triplett plans to implement it.
I would be more than happy to work on it, in collaboration with others
proposing such changes. I expect that such work consists of 10% doing
careful archive-wide scans to detect usage of packages, 10% writing
tooling, 5% writing relatively small patches, and 75% discussion threads
having to defend the value of such work from people who have no interest
in working on it themselves but spend a lot of energy telling other
people it's a waste of time.

I would also be thrilled to write patches to Policy, establishing a very
clear process for *carefully* reducing the Essential set in step-by-step
ways. For instance, I'd try to revive a past Policy patch I wrote adding
an exception to the policy against depending on Essential packages.

And personally, I'd likely start by putting together a `dh-shelldeps`,
which parses shell the way that things like shellcheck does, does a
rough approximation of "what is every program invoked by every shell
script in the package", and looks that up in `shelldeps` metadata
analogous to `shlibdeps`.
Post by Adrian Bunk
Trying to officially support removing essential packages sounds to me
like a maintainance nightmare with little benefit, you have to do some
explaining how you will keep this maintainable when you do it.
I expect it'll be maintained in much the same way most features in
Debian are maintained: the people who use it will submit patches, report
bugs when it doesn't work, and if they spend too much time reporting
those bugs or find it breaking more often than they'd like, they'll
implement more tooling (e.g. lintian checks, archive scans).

Right now, by way of example, if your package needs tzdata, and you fail
to depend on it, and because you have that package installed you don't
happen to notice, and you don't have autopkgtests that exercise that
part of the code, then there's very little to catch that. I don't think
this will be any worse than that, in practice, and we already deal with
that for e.g. `Priority: important` packages without substantial issues.
Chris Hofstaedtler
2025-04-21 12:10:01 UTC
Reply
Permalink
Post by Josh Triplett
Post by Adrian Bunk
Let's rather talk about what Debian should officially support,
and how Josh Triplett plans to implement it.
I would be more than happy to work on it, in collaboration with others
proposing such changes. I expect that such work consists of 10% doing
careful archive-wide scans to detect usage of packages, 10% writing
tooling, 5% writing relatively small patches,
and 75% discussion threads
having to defend the value of such work from people who have no interest
in working on it themselves but spend a lot of energy telling other
people it's a waste of time.
This part seems a problem indeed.

Nevertheless, the people that are interested do exist. I also
observe that various current efforts in Debian work towards making
this easier, even if they start with a different proposition /
cause.

I would suggest you hit up some of the current maintainers of
Essential: yes packages, and leave the naysayers on d-devel to
themselves.

Chris
Santiago Vila
2025-04-21 13:00:01 UTC
Reply
Permalink
I would suggest you hit up some of the current maintainers of Essential: yes packages, and leave the naysayers on d-devel to themselves.
Note that there might be some overlap in those two sets of people.

The set of currently essential packages, and the fact that awk is among them
in particular, reflects a consensus which may be seen as nearly "foundational".

Consensus is not sacred and everything is open to discussion, but extraordinary breakages of consensus (like this one) should require extraordinary benefits,
and in this case we are talking about 263KB in a container image several
orders of magnitude bigger.

Also, while the idea of Josh might sound good in theory (adding dependencies
will not harm anybody, we just want to see the dependencies explicit),
it might create some undeserved pressure on maintainers to stop using awk.

In some cases I'm sure that it would be easy to rewrite the code, but in
some others the alternate construction may be a lot less readable, and
overall worse.

Note also that the base system and the container images are expected
to grow over time, because everything grows over time, but machines
hosting those container images also grow over time, so one would
naturally wonder why awk has become a problem now when it was never
a problem due to its extremely small size.

My modest proposal here after trixie, if there is a consensus that
it's a good step, would be to replace mawk by original-awk in the
base system and see what can we learn from that. I would see that
little change as something similar to what we did with /bin/sh
being replaced by dash to ensure compatibility and standards
compliance (back then, we discovered some bashisms, and we either
rewrote them to be sh-compliant or used #!/bin/bash instead, and
everybody was happy with those little incremental changes). I don't
think we have many mawk-isms in the distribution, but this would be
an opportunity to check if all AWKs are really interchangeable.

Thanks.
Chris Hofstaedtler
2025-04-21 13:00:01 UTC
Reply
Permalink
Post by Santiago Vila
I would suggest you hit up some of the current maintainers of Essential: yes packages, and leave the naysayers on d-devel to themselves.
Note that there might be some overlap in those two sets of people.
The set of currently essential packages, and the fact that awk is among them
in particular, reflects a consensus which may be seen as nearly "foundational".
Consensus is not sacred and everything is open to discussion, but extraordinary breakages of consensus (like this one) should require extraordinary benefits,
and in this case we are talking about 263KB in a container image several
orders of magnitude bigger.
I understood Josh's mail to include more than just awk, and that awk
should probably not be the first package on the list to tackle.

It will be a long journey anyway.
Post by Santiago Vila
[..]
Chris
G. Branden Robinson
2025-04-21 13:40:02 UTC
Reply
Permalink
Post by Santiago Vila
Also, while the idea of Josh might sound good in theory (adding
dependencies will not harm anybody, we just want to see the
dependencies explicit),
While I support that proposal and initiative...
Post by Santiago Vila
it might create some undeserved pressure on maintainers to stop using awk.
I agree with that, too. Our industry struggles to resist recurring
trends to rewrite everything in the language du jour. This decade it's
Go and/or Rust; both languages have things to recommend them (and both
their communities have demonstrated worrisome governance problems).
Post by Santiago Vila
In some cases I'm sure that it would be easy to rewrite the code, but
in some others the alternate construction may be a lot less readable,
and overall worse.
Yes, and we also have to ask what we have to gain by doing so, apart
from fashionability and bragging rights on a CV. Nothing stops anyone
from reimplementing anything in any language and slapping it up on
GitHub or a Gitlab site to prove their skills--but my impression is that
a lot of people only feel such an undertaking is worth the effort if
they can cram it down a lot of other people's throats. Doing so shows
that one is "impactful", and therefore appealing to hiring managers at
startups and other places that natter on about being "disruptive" and
about Schumpeter's "creative destruction" as the engine of capitalism.

AWK is a nice language--small, pleasant, and consistent for problems
where C is too much trouble but a C-ish syntax is comfortably familiar
to your target audience, the shell is too quirky, and where you don't
need a bulky standard library.
Post by Santiago Vila
Note also that the base system and the container images are expected
to grow over time, because everything grows over time, but machines
hosting those container images also grow over time, so one would
naturally wonder why awk has become a problem now when it was never
a problem due to its extremely small size.
Yes. I have little interest in the drive to shrink container images for
its/their own sake.
Post by Santiago Vila
My modest proposal here after trixie, if there is a consensus that
it's a good step, would be to replace mawk by original-awk in the
base system and see what can we learn from that.
I just learned that you're the maintainer of original-awk (a.k.a. BWK
AWK)...

We can observe right now that the space savings is meager. Using older
data on amd64, I see:

Package: original-awk
Version: 2018-08-27-1
Maintainer: Santiago Vila <***@debian.org>
Installed-Size: 180 kB

Package: mawk
Version: 1.3.4.20200120-2
Maintainer: Boyuan Yang <***@debian.org>
Installed-Size: 248 kB

...for a savings of 68kB from your proposal. Hmm, how much does
perl-base grow from one Debian release to the next?

Package: perl-base
Version: 5.36.0-7+deb12u2
Installed-Size: 7639 kB

Package: perl-base
Version: 5.40.1-3
Installed-Size: 7811 kB

Difference: 172 kB

So whatever we'd have gained by hypothetically trading mawk for
original-awk in trixie, or even eliminating AWK entirely from the
Essential set, we'd have traded away simply by having Perl around.
Post by Santiago Vila
I would see that little change as something similar to what we did
with /bin/sh being replaced by dash to ensure compatibility and
standards compliance
This argument requires a footnote. Dash has its own problems with POSIX
conformance[1] and we insist on a couple of extensions to POSIX behavior
for own own sanity (the one I can remember is the `local` keyword).
Post by Santiago Vila
(back then, we discovered some bashisms, and we either rewrote them to
be sh-compliant or used #!/bin/bash instead, and everybody was happy
with those little incremental changes).
It was a good thing to do, but the standards-compliance benefit was, I
think, more a matter of inchoate bragging rights (see above) than
concrete benefit. The benefit, I think, came in saying what we meant:
either expressing dependencies explicitly, or eliminating unnecessary
ones. Also, it was really important for people using "upstart" as their
init system because, as I recall, the time differential when dynamically
loading Bash versus dash was thought to be an easy win for performance.
Bragging rights and impactfulness again.
Post by Santiago Vila
I don't think we have many mawk-isms in the distribution, but this
would be an opportunity to check if all AWKs are really
interchangeable.
...and make you the maintainer of (even more?) Essential packages. ;-)

original-awk's man page admits to one area of POSIX-nonconformance:

BUGS
...
POSIX‐standard interval expressions in regular expressions are not
supported.

...which I think weakens the case for your proposal helping us to have
AWK scripts that don't exercise extensions to POSIX. (But maybe the
newer original-awk that supports CSV data--a non-POSIX extension--fixes
that.)

I wonder if it'd be less effort to _review_ what AWK scripts we have
in maintainer scripts for satisfiability by any POSIX-conforming AWK.
How many can there be? </Jeremy Clarkson>

Regards,
Branden

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862907
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870317
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961737
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076035
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087810
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101388
Santiago Vila
2025-04-21 14:10:01 UTC
Reply
Permalink
BUGS
...
POSIX‐standard interval expressions in regular expressions are not
supported.
...which I think weakens the case for your proposal helping us to have
AWK scripts that don't exercise extensions to POSIX. (But maybe the
newer original-awk that supports CSV data--a non-POSIX extension--fixes
that.)
Yes, maybe, I have not checked if/when they fixed that, but the line you
quote about interval expressions is no longer present in stable.
(I infer that you are you reading the manpage for bullseye?)

Thanks.
Adrian Bunk
2025-04-21 14:40:01 UTC
Reply
Permalink
...
BUGS
...
POSIX‐standard interval expressions in regular expressions are not
supported.
...which I think weakens the case for your proposal helping us to have
AWK scripts that don't exercise extensions to POSIX. (But maybe the
newer original-awk that supports CSV data--a non-POSIX extension--fixes
that.)
I wonder if it'd be less effort to _review_ what AWK scripts we have
in maintainer scripts for satisfiability by any POSIX-conforming AWK.
How many can there be? </Jeremy Clarkson>
POSIX doesn't matter here, awk has is a virtual essential package and
having original-awk installed as sole implementation has been supported
for decades.

Like with most essential tools the subset of functionality that is used
by the vast majority of users is pretty small, and often not well
aligned with what is in POSIX.

The few packages that are not happy with original-awk already use mawk
or gawk explicitly.

It is possible that more awk usage that doesn't work with original-awk
is found when changing the default, but I would be surprised if this
would uncover a large number of bugs.
Regards,
Branden
...
cu
Adrian
Tim Woodall
2025-04-21 14:00:01 UTC
Reply
Permalink
Post by Josh Triplett
One is "should we make dependencies on awk explicit, rather than having
them be implicit and undocumented because awk is Essential".
The other is "should we reduce dependencies on awk".
One of the things I dislike about awk being in the essential set is the
weird effect that dependencies on awk are hidden unless a package
depends on a particular implementation for some reason.

Some packages have a dependency on awk via the individual variants,
prettyping being one where I think it is probably pointless in practice.

i3lock-fancy is more weird as it also includes versions of "awk" that
don't satisfy awk.

More than half of packages that depend on mawk also depend on gawk,
which I guess is saying that they don't work with original-awk (except
for the packages that depend on all 3)

And gawk is definitely the winner in packages that require that specific
implementation. (I've not checked for non awk satisfying options)
Helmut Grohne
2025-04-18 06:20:01 UTC
Reply
Permalink
Hi Simon,
I noticed that Fedora 42 was released and their docker images lack a
'awk' tool. Debian trixie images ship with 'mawk' pre-installed right
now. While I'm not convinced the removal game is necessarily a good
one, I can see that it does have some advantages. Is it possible to
drop 'mawk' from the set of default tools in trixie? If not, what are
the blockers? What is the method to find out what the blockers are?
shrinking essential/minbase/container images generally is a worthwhile
goal as you saw from existing replies. What is not as useful is asking
"can we drop XXX?" with little context, because (as others indicated)
this is a ton of work. The way to advance these matters is doing
research.

One of the first aspects is what "dropping" means. Typical answers:
* Removing "Essential: yes"
* e2fsprogs, mount and a few more used to be essential.
* Removing dependencies
* apt (not essential, but close) used to depend on adduser.
* Reducing the Priority value
* We've been debating this for ifupdown.
* Removing dependencies within the build-essential set
* I recently proposed removing libcrypt-dev from build-essential.

In this case, the immediate meaning must be getting it out of essential.
However, that does not move it out of container images, which incurs
further work and also raises the user impact (see Sean's mail).

Next, there is a question of what we gain. Essential weighs in at
roughly 100MB (depending on how you count it). So regarding awk, we're
talking about a size reduction of about 0.3%. For comparison, being able
to substitute toybox for coreutils has the potential to reduce more than
10% of size. Removing bash (keeping dash) would be around 7%. Whilst
those other gains are significantly higher, their impact and effort also
is. Picking a sensible candidate is the difficult part here.

It leads us to analyzing the effort and impact. Being in the essential
set means that dependencies are not spelled out. So the first step is
locating those dependencies. As we will likely not be able to audit
Debian's source code for awk uses in a reasonable amount of time,
empirical methods are likely needed.
* Rebuild the archive with awk dropped and see what fails
* Consider using reproducible builds to additionally see what packages
change as a result of dropping awk (for those that happen to be
reproducible)
* Search for awk usage in maintainer scripts
https://binarycontrol.debian.net/?q=awk&path=unstable%2F.*%2Fp
Note that postrm scripts cannot express dependencies and need to be
rewritten without awk. It also means that if you assume people to
always purge their packages, we may remove awk in forky+1 at best if
we manage to fix all postrm in forky.
* Download all Debian binary packages and search for awk uses in the
installed files using regular expressions.
* Run autopkgtests with awk removed

Doing this is a ton of work. Doing that work and presenting the results
is what makes "can we drop awk?" a useful question as it answers the
cost part.

This is not meant to discourage you. Quite to the contrary. Reducing
implicit software dependencies has lots of other benefits such as easing
architecture bootstrapping and a smaller trusted computing base. It is a
topic you cannot do in a spare evening though.

For instance, I'd like to propose making coreutils substitutable in
essential like awk is substitutable. However, that question is not
presently "useful" in the sense that it lacks a sound implementation.
I've been pondering this with Jochen and Johannes back in Würzburg and
now Julian has picked up the question and arrived at a promising
prototype based on feedback from Guillem. I hope that we are discussing
coreutils soon, but that discussion will be so much more useful when it
comes with a prototype and an impact analysis.

Helmut
Adrian Bunk
2025-04-20 12:30:02 UTC
Reply
Permalink
Post by Helmut Grohne
...
It leads us to analyzing the effort and impact. Being in the essential
set means that dependencies are not spelled out. So the first step is
locating those dependencies. As we will likely not be able to audit
Debian's source code for awk uses in a reasonable amount of time,
empirical methods are likely needed.
* Rebuild the archive with awk dropped and see what fails
* Consider using reproducible builds to additionally see what packages
change as a result of dropping awk (for those that happen to be
reproducible)
...
Tools like awk/sed/perl would have to stay part of the build
essential set if they get dropped from the essential set.

Example:
/usr/share/aclocal/libtool.m4:AC_REQUIRE([AC_PROG_AWK])dnl
Post by Helmut Grohne
Helmut
cu
Adrian
Antonio Terceiro
2025-04-18 13:40:02 UTC
Reply
Permalink
Is it possible to drop 'mawk' from the set of default tools in trixie?
Regardless of the practical and important questions others raised on why
and how to actually do it, no change like this could be done responsibly
at this point of the trixie release cycle. We just entered the second
part of the freeze (soft freeze) last Tuesday April 15th.

Or, did you mean forkie?
Jonathan Dowland
2025-04-18 21:20:02 UTC
Reply
Permalink
I noticed that Fedora 42 was released and their docker images lack a
'awk' tool.
They likely lack perl, as well. Most/all awk usage in maintainer scripts
could probably be replaced with perl. But, if you are in the minimizing
game, perhaps you'd rather remove perl from the essential set? A
substantially harder project.
--
Please do not CC me for listmail.

👱🏻 Jonathan Dowland
✎ ***@debian.org
🔗 https://jmtd.net
Gioele Barabucci
2025-04-18 22:10:01 UTC
Reply
Permalink
Post by Jonathan Dowland
I noticed that Fedora 42 was released and their docker images lack a
'awk' tool.
They likely lack perl, as well. Most/all awk usage in maintainer scripts
could probably be replaced with perl. But, if you are in the minimizing
game, perhaps you'd rather remove perl from the essential set? A
substantially harder project.
My hobby for the past three years has been working on removing perl from
the transitive essential set.

It's very doable. I have PoC VMs where perl is not installed at all. I'm
very slowly polishing and upstreaming my changes: I've sent dozens
(hundreds?) of patches and there now are only about 40 maintscripts
where perl is directly used. Most importantly, I've already removed all
uses of perl from postrm maintscripts in bookworm. I've also written
shell-only replacements of Perl programs used in the transitive
essential set.

Once forky is open for development I may send a more public announcement
of this project.

Regards,
--
Gioele Barabucci
Michael Stone
2025-04-19 13:50:01 UTC
Reply
Permalink
Post by Jonathan Dowland
They likely lack perl, as well. Most/all awk usage in maintainer
scripts could probably be replaced with perl. But, if you are in the
minimizing game, perhaps you'd rather remove perl from the essential
set? A substantially harder project.
If the goal is a minimal container image, why use debian at all vs a
distribution optimized for that purpose? Running alpine without perl is
already a solved problem...
Chris Hofstaedtler
2025-04-19 14:20:02 UTC
Reply
Permalink
Post by Michael Stone
Post by Jonathan Dowland
They likely lack perl, as well. Most/all awk usage in maintainer
scripts could probably be replaced with perl. But, if you are in the
minimizing game, perhaps you'd rather remove perl from the essential
set? A substantially harder project.
If the goal is a minimal container image, why use debian at all vs a
distribution optimized for that purpose? Running alpine without perl
is already a solved problem...
This is true for a lot of things Debian is used for. As an example:
GNOME desktop users could also use Fedora, and the work of
maintaining GNOME in Debian would be saved.

People like to use Debian for a lot of different reasons. Very large
and very small installs are "just" usecases too. When there are enough
people interested (and so on...) in it, it will happen.

Chris
Michael Stone
2025-04-19 15:30:01 UTC
Reply
Permalink
Post by Chris Hofstaedtler
Post by Michael Stone
If the goal is a minimal container image, why use debian at all vs a
distribution optimized for that purpose? Running alpine without perl
is already a solved problem...
GNOME desktop users could also use Fedora, and the work of
maintaining GNOME in Debian would be saved.
No, that's not the same at all. Debian is a general purpose OS that can
form the foundation for a lot of variants. But, that flexibility has a
cost, and the cost is size & complexity. /var/lib/apt and /var/lib/dpkg
alone are the size of a minimal linux distribution, without even
accounting for actual executables. You can shrink the minimal set by
making some components replaceable, but for a general purpose OS that
implies the 60k update-alternatives program plus /etc/alternatives plus
/var/lib/dpkg/alternatives--all to support reconfiguration that won't
ever happen in a container image. If size alone is the driving
requirement, a general purpose OS like Debian (or Fedora, etc.) isn't
the right starting point.

You *can* build a really small container based on debian by starting
with udebs and ditching package management/interactive
configuration/etc. (Or, many debian container guides advocate a generous
use of rm -rf to get rid of a lot of that stuff after the fact.)
But in that context I don't see the relevance in talking about trimming
stuff from a normal debian base install because the target isn't a
normal debian base install.
Josh Triplett
2025-04-20 01:50:01 UTC
Reply
Permalink
Debian is a general purpose OS that can form the foundation for a lot
of variants. But, that flexibility has a cost, and the cost is size &
complexity. /var/lib/apt and /var/lib/dpkg alone are the size of a
minimal linux distribution, without even accounting for actual
executables. You can shrink the minimal set by making some components
replaceable, but for a general purpose OS that implies the 60k
update-alternatives program plus /etc/alternatives plus
/var/lib/dpkg/alternatives--all to support reconfiguration that won't
ever happen in a container image.
Omitting whole directories like /var/lib/dpkg and /var/lib/apt (for
finalized containers that will never get more packages installed atop
them), or /usr/share/{doc,info,man,locale} (for most containers) is
straightforward and easy, and any container optimizing for size starts
there.

And the extra symlinks in `/etc/alternatives` don't take much size; I
agree you don't need update-alternatives, but then, you also don't
strictly need the entire dpkg and apt packages, if you're already
omitting their files under /var/lib.

Omitting other packages is harder, and more error-prone. And that's the
area where `Essential` makes it much harder. If there were explicit
dependencies, it'd be a matter of carefully pruning the DAG, rather than
a matter of carefully manually checking what has an unststated
dependency on what.
Simon Josefsson
2025-04-20 10:50:01 UTC
Reply
Permalink
Post by Josh Triplett
And the extra symlinks in `/etc/alternatives` don't take much size; I
agree you don't need update-alternatives, but then, you also don't
strictly need the entire dpkg and apt packages, if you're already
omitting their files under /var/lib.
Right -- has anyone considered if Debian should have official containers
without apt and dpkg? I think that for many use-cases for containers,
apt and dpkg will not be used and just take up space. Guix packs
(containers) doesn't get Guix installed unless you specify that as a
package you want to have installed (which is usually not necessary), so
something like this should be possible.

/Simon
Simon McVittie
2025-04-20 12:20:01 UTC
Reply
Permalink
Post by Simon Josefsson
has anyone considered if Debian should have official containers
without apt and dpkg?
What would those containers be useful for? I would have expected that in
any use-case for a container without apt and dpkg, what you would really
want is whatever "payload" packages are the actual purpose of the
container (for example that might be a database or a web server), plus
the Essential set, minus dpkg and any other Essential packages that are
unnecessary for the use-case - but on the way to preparing that, you'd
temporarily need apt and dpkg, in order to install the "payload". It
isn't really feasible to do that without knowing in advance what the
"payload" is going to be.

For example, an equivalent of the pseudo-official debian:bookworm image
on Dockerhub, but without dpkg, is unlikely to be directly useful on its
own, because it has neither a "payload" nor a way to install one; but an
equivalent of the Debian-based postgres:latest image on Dockerhub
*would* be useful, because the database is useful in its own right.
However, I suspect that Debian is unlikely to want to get into preparing
an image for every possible choice of "payload", or choosing which
servers are important enough to get an official container image and
which ones don't. (It's hard enough to draw a reasonable line between
the desktop environments that get an entry in tasksel and the environments
that don't, and we have a lot more servers than desktop environments.)

As some prior art for this, the Steam Linux Runtime containers that I
help to maintain for Valve are Debian derivatives containing various
libraries that are necessary or useful for games, but no dpkg and apt.
The process we use to prepare them is to bootstrap a minbase-like
container, add apt sources for backported packages, install metapackages
that pull in all the libraries we want to support, and finally `dpkg
--purge` any Essential packages that have already served their purpose
(for example we explicitly delete perl-base), with the last package
management step being something along the lines of
`dpkg --force-depends --force-remove-essential --purge dpkg`.

smcv
Josh Triplett
2025-04-20 17:10:01 UTC
Reply
Permalink
Post by Simon Josefsson
Post by Josh Triplett
And the extra symlinks in `/etc/alternatives` don't take much size; I
agree you don't need update-alternatives, but then, you also don't
strictly need the entire dpkg and apt packages, if you're already
omitting their files under /var/lib.
Right -- has anyone considered if Debian should have official containers
without apt and dpkg? I think that for many use-cases for containers,
apt and dpkg will not be used and just take up space. Guix packs
(containers) doesn't get Guix installed unless you specify that as a
package you want to have installed (which is usually not necessary), so
something like this should be possible.
The tricky part of that would be that you then couldn't use that
container image as a base and install any further packages. Offering a
"stock" container image without dpkg and apt would mean that the
container image has to *already* have everything installed that people
using the container need. (By contrast, if someone is installing their
own container they could then finalize it by removing dpkg and apt and
other things not needed at runtime.)

I think it's a good idea to support this case, but I would ideally want
to support it in tools that people use to build containers. For
instance, suppose we had an mmdebstrap option to purge dpkg and apt and
associated paraphernalia, after installing everything needed.

On a slightly related note, one of these days I'd love to figure out how
we could stop systematically installing /usr/share/lintian/overrides *in
binary packages*, and move them to some form of metadata that doesn't
get installed. It's easy enough to exclude that directory, but doing so
shouldn't be necessary in the first place; those overrides only have
value when running lintian on the package, not when installing the
package normally.
Adrian Bunk
2025-04-20 18:20:01 UTC
Reply
Permalink
Post by Josh Triplett
Post by Simon Josefsson
Post by Josh Triplett
And the extra symlinks in `/etc/alternatives` don't take much size; I
agree you don't need update-alternatives, but then, you also don't
strictly need the entire dpkg and apt packages, if you're already
omitting their files under /var/lib.
Right -- has anyone considered if Debian should have official containers
without apt and dpkg? I think that for many use-cases for containers,
apt and dpkg will not be used and just take up space. Guix packs
(containers) doesn't get Guix installed unless you specify that as a
package you want to have installed (which is usually not necessary), so
something like this should be possible.
The tricky part of that would be that you then couldn't use that
container image as a base and install any further packages. Offering a
"stock" container image without dpkg and apt would mean that the
container image has to *already* have everything installed that people
using the container need. (By contrast, if someone is installing their
own container they could then finalize it by removing dpkg and apt and
other things not needed at runtime.)
I think it's a good idea to support this case, but I would ideally want
to support it in tools that people use to build containers. For
instance, suppose we had an mmdebstrap option to purge dpkg and apt and
associated paraphernalia, after installing everything needed.
...
This would be for the use case where a user does not want to be able to
install security updates, but does need binary compatibility with Debian.

That's a rare use case.

When binary compatibility is not required, source-based distributions
will always provide smaller images with slightly better performance.

cu
Adrian
Josh Triplett
2025-04-20 18:50:01 UTC
Reply
Permalink
Post by Adrian Bunk
Post by Josh Triplett
Post by Simon Josefsson
Post by Josh Triplett
And the extra symlinks in `/etc/alternatives` don't take much size; I
agree you don't need update-alternatives, but then, you also don't
strictly need the entire dpkg and apt packages, if you're already
omitting their files under /var/lib.
Right -- has anyone considered if Debian should have official containers
without apt and dpkg? I think that for many use-cases for containers,
apt and dpkg will not be used and just take up space. Guix packs
(containers) doesn't get Guix installed unless you specify that as a
package you want to have installed (which is usually not necessary), so
something like this should be possible.
The tricky part of that would be that you then couldn't use that
container image as a base and install any further packages. Offering a
"stock" container image without dpkg and apt would mean that the
container image has to *already* have everything installed that people
using the container need. (By contrast, if someone is installing their
own container they could then finalize it by removing dpkg and apt and
other things not needed at runtime.)
I think it's a good idea to support this case, but I would ideally want
to support it in tools that people use to build containers. For
instance, suppose we had an mmdebstrap option to purge dpkg and apt and
associated paraphernalia, after installing everything needed.
...
This would be for the use case where a user does not want to be able to
install security updates,
With this style of container use case, you handle security updates (or
any other package version upgrade) by creating a new container with the
new package versions, and deplying that new container. That doesn't
require having apt or dpkg in the container.
Post by Adrian Bunk
but does need binary compatibility with Debian.
Or is just familiar with Debian, appreciates the variety of packages and
the maintenance and stability, and prefers to use it as their base.
Adrian Bunk
2025-04-20 19:40:02 UTC
Reply
Permalink
Post by Josh Triplett
Post by Adrian Bunk
Post by Josh Triplett
Post by Simon Josefsson
Post by Josh Triplett
And the extra symlinks in `/etc/alternatives` don't take much size; I
agree you don't need update-alternatives, but then, you also don't
strictly need the entire dpkg and apt packages, if you're already
omitting their files under /var/lib.
Right -- has anyone considered if Debian should have official containers
without apt and dpkg? I think that for many use-cases for containers,
apt and dpkg will not be used and just take up space. Guix packs
(containers) doesn't get Guix installed unless you specify that as a
package you want to have installed (which is usually not necessary), so
something like this should be possible.
The tricky part of that would be that you then couldn't use that
container image as a base and install any further packages. Offering a
"stock" container image without dpkg and apt would mean that the
container image has to *already* have everything installed that people
using the container need. (By contrast, if someone is installing their
own container they could then finalize it by removing dpkg and apt and
other things not needed at runtime.)
I think it's a good idea to support this case, but I would ideally want
to support it in tools that people use to build containers. For
instance, suppose we had an mmdebstrap option to purge dpkg and apt and
associated paraphernalia, after installing everything needed.
...
This would be for the use case where a user does not want to be able to
install security updates,
With this style of container use case, you handle security updates (or
any other package version upgrade) by creating a new container with the
new package versions, and deplying that new container. That doesn't
require having apt or dpkg in the container.
Post by Adrian Bunk
but does need binary compatibility with Debian.
Or is just familiar with Debian, appreciates the variety of packages and
the maintenance and stability, and prefers to use it as their base.
Container size is obviously not a priority for such users.

cu
Adrian
Josh Triplett
2025-04-20 19:50:01 UTC
Reply
Permalink
Post by Adrian Bunk
Container size is obviously not a priority for such users.
That is incorrect. Many, many users use Debian as the basis for
containers, and many such users care about container size, sufficiently
so to work on reducing it. You are suggesting that because they want to
use Debian, they don't care at all; I'm observing that they want to use
Debian and they care enough to try to make Debian better.
Michael Lazin
2025-04-20 22:10:01 UTC
Reply
Permalink
I think removing awk is a bad idea. It will break legacy scripts as has
already been suggested. I am mostly an observer on this list and say very
little but I think that awk is used by a lot of people. I used it in a
script that analyzed mail logs for example. It was previously written in
perl but I redid it in bash with awk and it ran faster.

Thank you,

Michael Lazin

.. τ᜞ γᜰρ αᜐτ᜞ ΜοεῖΜ ጐστίΜ τε κα᜶ εጶΜαι.
Post by Josh Triplett
Post by Adrian Bunk
Container size is obviously not a priority for such users.
That is incorrect. Many, many users use Debian as the basis for
containers, and many such users care about container size, sufficiently
so to work on reducing it. You are suggesting that because they want to
use Debian, they don't care at all; I'm observing that they want to use
Debian and they care enough to try to make Debian better.
Josh Triplett
2025-04-21 00:30:01 UTC
Reply
Permalink
Post by Michael Lazin
I think removing awk is a bad idea. It will break legacy scripts as
has already been suggested. I am mostly an observer on this list and
say very little but I think that awk is used by a lot of people. I
used it in a script that analyzed mail logs for example. It was
previously written in perl but I redid it in bash with awk and it ran
faster.
Nobody in this thread is proposing removing awk from the
default-installed set of packages in Debian. Removing "Essential: yes"
from it would mean that some small and very deliberately constructed
system images would do without it. A default or even minimal Debian
install via d-i or debootstrap or mmdebstrap *would* include awk, and
your script would continue to run just fine. If you ever had a system
without awk, it would be because you removed awk, or deliberately
constructed a system that omitted it.
t***@goirand.fr
2025-04-21 10:20:01 UTC
Reply
Permalink
Post by Josh Triplett
Post by Adrian Bunk
Container size is obviously not a priority for such users.
That is incorrect. Many, many users use Debian as the basis for
containers, and many such users care about container size, sufficiently
so to work on reducing it.
I agree that reducing our footprint is important. But not only for containers. I'd love it if our base cloud image was smaller too. It currently is a way bigger than it used to (329M for the generic cloud qcow image), I don't know why...


Thomas Goirand
Marco d'Itri
2025-04-20 19:50:01 UTC
Reply
Permalink
Post by Josh Triplett
On a slightly related note, one of these days I'd love to figure out how
we could stop systematically installing /usr/share/lintian/overrides *in
binary packages*, and move them to some form of metadata that doesn't
get installed.
Yes please! This is why I almost never add overrides to binary packages.
It's terminally stupid to waste space on all Debian systems in the world
because our tooling is suboptimal.
--
ciao,
Marco
Johannes Schauer Marin Rodrigues
2025-04-23 12:20:01 UTC
Reply
Permalink
Hi,

Quoting Josh Triplett (2025-04-20 19:05:13)
Post by Josh Triplett
Post by Simon Josefsson
Post by Josh Triplett
And the extra symlinks in `/etc/alternatives` don't take much size; I
agree you don't need update-alternatives, but then, you also don't
strictly need the entire dpkg and apt packages, if you're already
omitting their files under /var/lib.
Right -- has anyone considered if Debian should have official containers
without apt and dpkg? I think that for many use-cases for containers,
apt and dpkg will not be used and just take up space. Guix packs
(containers) doesn't get Guix installed unless you specify that as a
package you want to have installed (which is usually not necessary), so
something like this should be possible.
The tricky part of that would be that you then couldn't use that
container image as a base and install any further packages. Offering a
"stock" container image without dpkg and apt would mean that the
container image has to *already* have everything installed that people
using the container need. (By contrast, if someone is installing their
own container they could then finalize it by removing dpkg and apt and other
things not needed at runtime.)
Not quite. You can use dpkg --root to let dpkg work on another directory than
your current rootfs [1]. You can use DPkg::Options to pass such options to apt.
You can use dpkg --force-script-chrootless and then maintainer scripts will use
tools from outside the chroot to work on the chroot. mmdebstrap makes use of
this functionality.

[1] small wrinkle, dpkg will read its configuration from outside the chroot, so
this does not work well if you want to set up a system with a different
configuration, see #808203
Post by Josh Triplett
I think it's a good idea to support this case, but I would ideally want to
support it in tools that people use to build containers. For instance,
suppose we had an mmdebstrap option to purge dpkg and apt and associated
paraphernalia, after installing everything needed.
You don't purge dpkg. You just tell mmdebstrap to create a chroot without dpkg.
You can use mmdebstrap to create, for example, a chroot which contains all of
build-essential but not dpkg. mmdebstrap can do this by running apt and dpkg
from the outside, passing the chroot directory using the respective options and
let it populate the chroot with the packages you choose. That package set can
include dpkg and apt but it doesn't have to.

Don't purge dpkg and apt from your chroot. Do not install it at all in the
first place. :)

Thanks!

cheers, josch

Holger Levsen
2025-04-21 12:00:01 UTC
Reply
Permalink
Post by Simon Josefsson
Right -- has anyone considered if Debian should have official containers
without apt and dpkg? I think that for many use-cases for containers,
apt and dpkg will not be used and just take up space. Guix packs
(containers) doesn't get Guix installed unless you specify that as a
package you want to have installed (which is usually not necessary), so
something like this should be possible.
ask not what your distribution could do for you, ask what you could do for
your distribution!? IOW, i'm sure some people will have considered doing
this, some might even do it "somewhere", just appearantly noone has made
this happen for the existing Debian container projects...
--
cheers,
Holger

⢀⣎⠟⠻⢶⣊⠀
⣟⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
⠈⠳⣄

There has never been a time when the people who've banned books were the good
ones.
Marco d'Itri
2025-04-19 16:00:01 UTC
Reply
Permalink
Post by Michael Stone
If the goal is a minimal container image, why use debian at all vs a
distribution optimized for that purpose? Running alpine without perl
is already a solved problem...
Because I want to use a real libc, for a start.
--
ciao,
Marco
Loading...