Post by Santiago VilaAlso, while the idea of Josh might sound good in theory (adding
dependencies will not harm anybody, we just want to see the
dependencies explicit),
While I support that proposal and initiative...
Post by Santiago Vilait might create some undeserved pressure on maintainers to stop using awk.
I agree with that, too. Our industry struggles to resist recurring
trends to rewrite everything in the language du jour. This decade it's
Go and/or Rust; both languages have things to recommend them (and both
their communities have demonstrated worrisome governance problems).
Post by Santiago VilaIn some cases I'm sure that it would be easy to rewrite the code, but
in some others the alternate construction may be a lot less readable,
and overall worse.
Yes, and we also have to ask what we have to gain by doing so, apart
from fashionability and bragging rights on a CV. Nothing stops anyone
from reimplementing anything in any language and slapping it up on
GitHub or a Gitlab site to prove their skills--but my impression is that
a lot of people only feel such an undertaking is worth the effort if
they can cram it down a lot of other people's throats. Doing so shows
that one is "impactful", and therefore appealing to hiring managers at
startups and other places that natter on about being "disruptive" and
about Schumpeter's "creative destruction" as the engine of capitalism.
AWK is a nice language--small, pleasant, and consistent for problems
where C is too much trouble but a C-ish syntax is comfortably familiar
to your target audience, the shell is too quirky, and where you don't
need a bulky standard library.
Post by Santiago VilaNote also that the base system and the container images are expected
to grow over time, because everything grows over time, but machines
hosting those container images also grow over time, so one would
naturally wonder why awk has become a problem now when it was never
a problem due to its extremely small size.
Yes. I have little interest in the drive to shrink container images for
its/their own sake.
Post by Santiago VilaMy modest proposal here after trixie, if there is a consensus that
it's a good step, would be to replace mawk by original-awk in the
base system and see what can we learn from that.
I just learned that you're the maintainer of original-awk (a.k.a. BWK
AWK)...
We can observe right now that the space savings is meager. Using older
data on amd64, I see:
Package: original-awk
Version: 2018-08-27-1
Maintainer: Santiago Vila <***@debian.org>
Installed-Size: 180 kB
Package: mawk
Version: 1.3.4.20200120-2
Maintainer: Boyuan Yang <***@debian.org>
Installed-Size: 248 kB
...for a savings of 68kB from your proposal. Hmm, how much does
perl-base grow from one Debian release to the next?
Package: perl-base
Version: 5.36.0-7+deb12u2
Installed-Size: 7639 kB
Package: perl-base
Version: 5.40.1-3
Installed-Size: 7811 kB
Difference: 172 kB
So whatever we'd have gained by hypothetically trading mawk for
original-awk in trixie, or even eliminating AWK entirely from the
Essential set, we'd have traded away simply by having Perl around.
Post by Santiago VilaI would see that little change as something similar to what we did
with /bin/sh being replaced by dash to ensure compatibility and
standards compliance
This argument requires a footnote. Dash has its own problems with POSIX
conformance[1] and we insist on a couple of extensions to POSIX behavior
for own own sanity (the one I can remember is the `local` keyword).
Post by Santiago Vila(back then, we discovered some bashisms, and we either rewrote them to
be sh-compliant or used #!/bin/bash instead, and everybody was happy
with those little incremental changes).
It was a good thing to do, but the standards-compliance benefit was, I
think, more a matter of inchoate bragging rights (see above) than
concrete benefit. The benefit, I think, came in saying what we meant:
either expressing dependencies explicitly, or eliminating unnecessary
ones. Also, it was really important for people using "upstart" as their
init system because, as I recall, the time differential when dynamically
loading Bash versus dash was thought to be an easy win for performance.
Bragging rights and impactfulness again.
Post by Santiago VilaI don't think we have many mawk-isms in the distribution, but this
would be an opportunity to check if all AWKs are really
interchangeable.
...and make you the maintainer of (even more?) Essential packages. ;-)
original-awk's man page admits to one area of POSIX-nonconformance:
BUGS
...
POSIXâstandard interval expressions in regular expressions are not
supported.
...which I think weakens the case for your proposal helping us to have
AWK scripts that don't exercise extensions to POSIX. (But maybe the
newer original-awk that supports CSV data--a non-POSIX extension--fixes
that.)
I wonder if it'd be less effort to _review_ what AWK scripts we have
in maintainer scripts for satisfiability by any POSIX-conforming AWK.
How many can there be? </Jeremy Clarkson>
Regards,
Branden
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862907
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=870317
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=961737
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1076035
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087810
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101388