Discussion:
Is HURD's lack of HOST_NAME_MAX and PATH_MAX a good architectural approach
(too old to reply)
Sam Hartman
2025-01-20 20:30:01 UTC
Permalink
TL;DR: Is it time for the rest of Debian to stop conforming to HURD's
lack of maximums for path and hostname? By thispoint I think we
recognize those lack of maximums as an anti-pattern for DOS prevention
and other security reasons.

back in the day when I got my first HURD compatibility patch, I was
excited. I had read the original HURD paper, and a lot of the ideas HURD
was exploring sounded really neat.
Getting rid of arbitrary limits, translators, and lots of other cool
stuff.
So I was delighted to accept the patch into my package and try to be
part of something innovative.

I will admit I was kind of disappointed that rather than working to make
my package handle arbitrary hostnames, the patch simply introduced an
arbitrary constant for HURD.

HURD managed to explore a lot of interesting ground. Through
explorations like Plan9 and HURD, things like namespaces, fuse, and
other features revolutionized the Linux world, spawning important
innovations in themselves like containerization.

But I think that HURD's desire to remove arbitrary limits like hostname
and path maximums have proven not to be winners. We ran the experiment,
and I at least think the conclusion is that we'd be better off with
limits. Here are some of the concerns:

* Having different limits in different parts of the system can lead to
security problems. On Linux, when I have something that I know is
a valid path, say because it's coming from the kernel, I know it fits
in PATH_MAX. I don't need to worry that some other program has a
different idea of PATH_MAX and I might need to deal with bounds
checking or truncation attacks.
However, on HURD, it's probably not sufficient to just create a
PATH_MAX or HOST_NAME_MAX buffer. I probably also need to think about
what happens when something gets truncated or fails to fit into that
buffer. I need to think about DOS and other potential attacks.
I know I have not been diligent about reviewing the HURD compatibility
patches for these sorts of issues over the years.

* To some extent, there are intrinsit limits that are related. The DNS
does have a maximum for a domain name, and while that's not strictly
the same thing as a host name, practically speaking, we want host names
to be able to fit in domain names.

* The kind of dynamic memory handling required for avoiding arbitrary
limits introduces significant complexity. You need to have some limit
at some level to avoid resource exhaustion attacks. Having constant
size structures for things like stat buffers and unix domain sockets
is a lot simpler than dynamically allocating everything. Bounds
checking at compile time has value. So does avoiding all dynamic
allocation in critical sections of system resources.

The latest version of pam is not building on hurd-i386 and hurd-amd64.
One of the issues is HOST_NAME_MAX in modules/pam_xauth/pam_xauth.c.
I'm sure the hurd porters would send me a patch if I asked for one. I'm
sure I could come up with a patch on my own.

My question though is whether that's architecturally a good idea.
As a maintainer, I'm willing to accept the patch if we believe that
HURD's approach actually is a good one.
For a non-release architecture that I perceive as more on the way out
than on the way in, I'm not interested in accepting a patch if we think
the architectural approach is an anti-pattern.
Yes, that does put the hurd maintainers in an awkward position: pam is
transitively essential.

* They could agree that particular aspect of the HURD experiment is not
a success and patch system include files.

* They could find a way to patch pam only for HURD. I think that would
be a bad precedent, but I couldn't stop them.

* They could take the issue to TC, either as a question about this
specific issue, or asking the TC to set policy on what ports patches
maintainers should accept.

I think this is a good discussion to have though, and so I am soliciting
input both from the HURD community and from the broader Debian
community.
Samuel Thibault
2025-01-20 22:20:01 UTC
Permalink
Hello,
Post by Sam Hartman
I will admit I was kind of disappointed that rather than working to make
my package handle arbitrary hostnames, the patch simply introduced an
arbitrary constant for HURD.
It should not have, indeed.
Post by Sam Hartman
* Having different limits in different parts of the system can lead to
security problems. On Linux, when I have something that I know is
a valid path, say because it's coming from the kernel, I know it fits
in PATH_MAX.
Actually, no.

Please see

https://darnassus.sceen.net/~hurd-web/faq/foo_max/

Also more details in https://eklitzke.org/path-max-is-tricky

Quoting a bit:

$ printf '#include <limits.h>\nPATH_MAX' | cpp -P
$ d=0123456789; for i in `seq 1 1000`; do mkdir $d; cd $d 2>/dev/null; done
$ pwd | wc -c

Limiting PATH_MAX to 4096 is just a way to not actually try to think
about the problem, and hide the corresponding bugs.
Post by Sam Hartman
* The kind of dynamic memory handling required for avoiding arbitrary
limits introduces significant complexity.
For quite a lot of cases it's a matter of using realpath(path, NULL),
getcwd(NULL, 0), or asprintf().
Post by Sam Hartman
You need to have some limit at some level to avoid resource exhaustion
attacks.
Yes, but depending on the application the limit can vary. Using 4096 as
limit can be a bad idea if you might have millions of files. If you have
only a few of them you could accept much longer.
Post by Sam Hartman
The latest version of pam is not building on hurd-i386 and hurd-amd64.
One of the issues is HOST_NAME_MAX in modules/pam_xauth/pam_xauth.c.
In the case of HOST_NAME_MAX, we could indeed define it, because its
only meaning is the limitation of gethostname(), which is set by the
admin and we can limit that there too, it's not coming from whatever
else.
Post by Sam Hartman
I'm sure the hurd porters would send me a patch if I asked for one. I'm
sure I could come up with a patch on my own.
My question though is whether that's architecturally a good idea.
Concerning PATH_MAX, for safety of the software, yes.

Samuel
Sam Hartman
2025-01-20 22:50:01 UTC
Permalink
Samuel> Hello, Sam Hartman, le lun. 20 janv. 2025 13:21:32 -0700, a
Post by Sam Hartman
I will admit I was kind of disappointed that rather than working
to make my package handle arbitrary hostnames, the patch simply
introduced an arbitrary constant for HURD.
Samuel> It should not have, indeed.
Post by Sam Hartman
* Having different limits in different parts of the system can
lead to security problems. On Linux, when I have something that
I know is a valid path, say because it's coming from the kernel,
I know it fits in PATH_MAX.
Samuel> Actually, no.
Oh, hmm.
Thanks for educating me.


Samuel> Please see

Samuel> https://darnassus.sceen.net/~hurd-web/faq/foo_max/
You quoted the example from the FAQ, although I found it hard to parse.
My restatement is that it's possible to create paths where the full path
name is longer than PATH_MAX.

I guess a better way to look at this would be that paths beyond PATH_MAX
may break. Good coders are responsible for making sure that they break
in a security-preserving manner, but nothing actually prevents them from
existing.

I now agree the situation is more complicated than I thought.
I am still not sure that HURD's approach is an improvement over defining
PATH_MAX.
I suspect that it's valuable to have a consistent PATH_MAX across a
distribution.
I suspect that in practice what people do (and what HURD porters
generally do when supplying patches) is pick a number and define
PATH_MAX.

So, I'm not at all convinced that the HURD approach adds significant
value in a distribution like Debian.

(It appears pam upstream has accepted someone's patch to conditionalize
definitions of PATH_MAX. I'm a bit horrified about how many #ifndef
PATH_MAX there are in pam_mkhomedir_helper, but at least for pam, the
question of PATH_MAX appears theoretical.)
Samuel Thibault
2025-01-20 23:00:01 UTC
Permalink
Post by Sam Hartman
My restatement is that it's possible to create paths where the full path
name is longer than PATH_MAX.
I guess a better way to look at this would be that paths beyond PATH_MAX
may break.
And unfortunately, code that just uses PATH_MAX as allocation size most
often do not really take care about this case, and then get possibly
vulnerable.

Samuel
Sam Hartman
2025-01-21 00:20:02 UTC
Permalink
Samuel> And unfortunately, code that just uses PATH_MAX as
Samuel> allocation size most often do not really take care about
Samuel> this case, and then get possibly vulnerable.

Right, I'm just not sure the HURD approach is better.
The pam 1.5.3 hurd compatibility patch simply defines PATH_MAX to 4096.
I believe that previous krb5 patches have done something similar.
I think this approach is quite common to how people approach HURD
compatibility.
If we were using HURD's lack of PATH_MAX as a way to audit code for
these sorts of problems, it would make more sense to me.
Right now, it mostly appears that Debian inherits all the broken code
that does not deal with path overflow even on HURD plus gains additional
complexity in porting to HURD.

And I do suspect there are a class of bugs that are introduced when
PATH_MAX varies across a distribution.
Samuel Thibault
2025-01-21 00:30:01 UTC
Permalink
Post by Sam Hartman
The pam 1.5.3 hurd compatibility patch simply defines PATH_MAX to 4096.
I believe that previous krb5 patches have done something similar.
I think this approach is quite common to how people approach HURD
compatibility.
Yes, and that's unfortunate, I don't recommend this.

Samuel
Guillem Jover
2025-01-21 02:30:01 UTC
Permalink
Hi!
Post by Sam Hartman
TL;DR: Is it time for the rest of Debian to stop conforming to HURD's
lack of maximums for path and hostname? By thispoint I think we
recognize those lack of maximums as an anti-pattern for DOS prevention
and other security reasons.
While I agree on some of your premises below, I do not agree with the
conclusions, my conclusion has actually been for a long time, the
opposite.

Disclosure: I used to be part of the Debian Hurd porters team,
maintained mig and gnumach for a while, and even packaged L4 Pistachio
(an alternative microkernel) for a while, when there was discussions
about switching away from GNU Mach. I still try to follow what's going
in the port, and sporadically review or port stuff. I still consider
myself a porter (in general) at heart.
Post by Sam Hartman
I will admit I was kind of disappointed that rather than working to make
my package handle arbitrary hostnames, the patch simply introduced an
arbitrary constant for HURD.
I have not checked this specific case, but I think in general there
have been several reasons for the different kinds of patches being sent.
From the general experience of the porter, their knowledge of the
code being ported, the apparent urgency of getting something fixed,
whether stopping to use arbitrary limits would break exposed ABI,
the receptiveness of the receiving maintainers and their desire for
more correct/robust but potentially more intrusive patches, or their
desire for more minimal patches to simply fix FTBFS problems, etc.

Without more details, the patch you describe I'd consider it a
workaround, which I'd have asked to be reworked if I had reviewed
it, or would have provided an alternative myself. And I think these
kind of patches are frowned upon by the Hurd porters in general.
Post by Sam Hartman
HURD managed to explore a lot of interesting ground. Through
explorations like Plan9 and HURD, things like namespaces, fuse, and
other features revolutionized the Linux world, spawning important
innovations in themselves like containerization.
(Sadly, many of the things that got into Linux, feel like they got
bolted on, in many cases in unnatural or complex ways, with harder
to grasp semantics and increased security exposure.)
Post by Sam Hartman
But I think that HURD's desire to remove arbitrary limits like hostname
and path maximums have proven not to be winners. We ran the experiment,
and I at least think the conclusion is that we'd be better off with
limits.
I disagree for both statements.

The way I've always interpreted the arbitrary limit stance, has been
that the code should be robust and be prepared to handle data of any
length, while dynamically handling truncation, and underlying limits
(say from specific filesystem implementations or protocols, allocation
failures, etc, because we live surrounded by a limited world anyway).
And not cooking such hardcoded limits means that the code is ready for
any such underlying limits to be bumped freely, or changing beneath us
due to underlying implementation changes. AFAIR this is currently a
problem on GNU/Linux for example, because even if we wanted to increase
the pathname or other such arbitrary limits, these are part of the ABI
now. :/

And while I agree that specifically on security sensitive contexts,
unfortunately it might be needed to add arbitrary limits for security
reasons as you mentioned, this to me still does not translate to the
examples given with pathnames and hostnames, and other similar system
resources. And a limit that might seem reasonable today, will most
probably be a hindrance years ahead or with bigger equipment, and
might easily hamper functionality. So I'd still be very ware of such
limits.
Post by Sam Hartman
* Having different limits in different parts of the system can lead to
security problems. On Linux, when I have something that I know is
a valid path, say because it's coming from the kernel, I know it fits
in PATH_MAX. I don't need to worry that some other program has a
different idea of PATH_MAX and I might need to deal with bounds
checking or truncation attacks.
However, on HURD, it's probably not sufficient to just create a
PATH_MAX or HOST_NAME_MAX buffer. I probably also need to think about
what happens when something gets truncated or fails to fit into that
buffer. I need to think about DOS and other potential attacks.
I know I have not been diligent about reviewing the HURD compatibility
patches for these sorts of issues over the years.
I agree with this concern, but I see it in reverse actually. Because
on GNU/Linux there's this hardcoding, there is less care for truncation
as you mention, which can still happen, because you might not control
what another program or a user on the same system or even from another
system with a different limit is passing to you. Even on GNU/Linux it's
still possible to have different limits and restriction depending on
what you are working on, say odd filesystems, network filesystems, etc.

Check for example:

https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits

Where for both NAME_MAX and PATH_MAX, there are cases of downwards and
upward limits (or no limits at all). Even nesting multiple mount
points can get you off limits easily.
Post by Sam Hartman
* To some extent, there are intrinsit limits that are related. The DNS
does have a maximum for a domain name, and while that's not strictly
the same thing as a host name, practically speaking, we want host names
to be able to fit in domain names.
As mentioned above, there will always be some limit somewhere, if not
only just due to available memory. But even focusing on current
protocol limits seems problematic, because that also ossifies what can
be done in the future.
Post by Sam Hartman
* The kind of dynamic memory handling required for avoiding arbitrary
limits introduces significant complexity. You need to have some limit
at some level to avoid resource exhaustion attacks. Having constant
size structures for things like stat buffers and unix domain sockets
is a lot simpler than dynamically allocating everything. Bounds
checking at compile time has value. So does avoiding all dynamic
allocation in critical sections of system resources.
Over the years I've found that the changes to remove these limits,
makes (in general) the code more robust, future proof, can even
simplify it, and in some cases makes one use better APIs to accomplish
the job.
Post by Sam Hartman
The latest version of pam is not building on hurd-i386 and hurd-amd64.
One of the issues is HOST_NAME_MAX in modules/pam_xauth/pam_xauth.c.
I'm sure the hurd porters would send me a patch if I asked for one. I'm
sure I could come up with a patch on my own.
For this case, notice how the concern that you mentioned above is
quite present here with GNU/Linux exposing _POSIX_HOST_NAME_MAX,
HOST_NAME_MAX and MAXHOSTNAMELEN with diverging values. If the code is
changed to remove arbitrary limits, then these divergences suddenly
disappear.
Post by Sam Hartman
My question though is whether that's architecturally a good idea.
Yes, I think it's the better option, for robustness, for
future-proofness, for functionality, for security.

While it might seem annoying, because using a hardcoded limit seems
easier, this also looks like a trap, where one ends up ignoring cases
that are silently there but might not be obvious and will still
affect the code, and ossify the whole system impeding future
improvements (although for existing GNU/Linux ports, that's probably
too late anyway).
Post by Sam Hartman
As a maintainer, I'm willing to accept the patch if we believe that
HURD's approach actually is a good one.
For a non-release architecture that I perceive as more on the way out
than on the way in,
hurd-i386 is probably on the way out, but that will eventually be
replaced by hurd-amd64 which seems to be coming along nicely.
Post by Sam Hartman
I'm not interested in accepting a patch if we think
the architectural approach is an anti-pattern.
Yes, that does put the hurd maintainers in an awkward position: pam is
transitively essential.
* They could agree that particular aspect of the HURD experiment is not
a success and patch system include files.
I think that would be a mistake that would tie the port into a
position similar to GNU/Linux where backing out of it is hard to
impossible, and where that port then needs to live with the
consequences indefinitely.
Post by Sam Hartman
* They could find a way to patch pam only for HURD. I think that would
be a bad precedent, but I couldn't stop them.
This has already been possible for a long time. We introduced long
long ago the "unreleased" suite to soft-fork required packages which
were urgent to fix to unblock the buildds, or for which there were
only workaround patches, or the maintainer didn't want to accept a
patch, or similar reasons. This is cumbersome though, as it needs for
these to be kept in sync.
Post by Sam Hartman
* They could take the issue to TC, either as a question about this
specific issue, or asking the TC to set policy on what ports patches
maintainers should accept.
It's my long standing stance that this is always detrimental to the
project; a failure of the community at large, that it ruptures the
social fabric of the project, and given its power and structure
is unjust in nature. If this would ever be needed, I think going
upstream directly or just using the "unreleased" suite is always
going to be socially better and more productive anyway.

Thanks,
Guillem
Janneke Nieuwenhuizen
2025-01-21 10:00:01 UTC
Permalink
Post by Sam Hartman
TL;DR: Is it time for the rest of Debian to stop conforming to HURD's
lack of maximums for path and hostname?
The GNU Coding standards say: Avoid arbitrary limits
(<https://www.gnu.org/prep/standards/html_node/Semantics.html>. That
was one of the great inspirations for me for wanting to learn much more
about GNU.

Similarly, GNU LilyPond was designed--other than all other music
softwares in the day--to not have arbitrary limits.

If you are running a distribution such as GNU Guix or NixOS, you may
even have run into file name limits.

It often surprises (saddens?) me how few people in the free software
community actually read (or remembered) the GNU standards.

Greetings,
Janneke
--
Janneke Nieuwenhuizen <***@gnu.org> | GNU LilyPond https://LilyPond.org
Freelance IT https://www.JoyOfSource.com | Avatar® https://AvatarAcademy.com
Simon Josefsson
2025-01-21 12:30:01 UTC
Permalink
Post by Janneke Nieuwenhuizen
Post by Sam Hartman
TL;DR: Is it time for the rest of Debian to stop conforming to HURD's
lack of maximums for path and hostname?
The GNU Coding standards say: Avoid arbitrary limits
(<https://www.gnu.org/prep/standards/html_node/Semantics.html>.
I agree, but it is going further than that: POSIX doesn't require
PATH_MAX and I believe the semantics of PATH_MAX is not what some code
assume that the symbol has. See below for specified semantics.

I believe use of PATH_MAX as a char[] array length specifier is usually
an indicator of a bug or a security vulenrability waiting to happen. It
is not a idiom we should strive towards.

/Simon

https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/limits.h.html

Pathname Variable Values

The values in the following list may be constants within an
implementation or may vary from one pathname to another. For example,
file systems or directories may have different characteristics.

A definition of one of the symbolic constants in the following list
shall be omitted from the <limits.h> header on specific implementations
where the corresponding value is equal to or greater than the stated
minimum, but where the value can vary depending on the file to which it
is applied. The actual value supported for a specific pathname shall be
provided by the pathconf() function.

{PATH_MAX}

Maximum number of bytes the implementation stores as a pathname in a
user-supplied buffer of unspecified size, including the terminating
null character. Minimum number the implementation shall accept as
the maximum number of bytes in a pathname.

Minimum Acceptable Value: {_POSIX_PATH_MAX}

{_POSIX_PATH_MAX}
Minimum number the implementation shall accept as the maximum number of bytes in a pathname.
Value: 256

{_XOPEN_PATH_MAX}
[XSI] [Option Start]
Minimum number the implementation shall accept as the maximum number of bytes in a pathname.
Value: 1024 [Option End]

Samuel Thibault
2025-01-21 10:30:02 UTC
Permalink
Post by Sam Hartman
One of the issues is HOST_NAME_MAX in modules/pam_xauth/pam_xauth.c.
It doesn't make much sense to allow arbitrarily long host names. I'm more
reserved for path names as there are (arguably pathological) use cases where
any fixed limit could be reached.
I think that a satisfactory way to deal with some security issues (let's
call them "name bombs") would be for HURD to implement limits
It already does so. It just doesn't expose them in the API, so they can
be raised if need be.

Samuel
Julien Plissonneau Duquène
2025-01-21 10:30:02 UTC
Permalink
Hi,
Post by Sam Hartman
One of the issues is HOST_NAME_MAX in modules/pam_xauth/pam_xauth.c.
It doesn't make much sense to allow arbitrarily long host names. I'm
more reserved for path names as there are (arguably pathological) use
cases where any fixed limit could be reached.

I think that a satisfactory way to deal with some security issues (let's
call them "name bombs") would be for HURD to implement limits that can
be configured at run time, in the spirit of linux sysctls for many
limits. That won't help much with the other issues though.

In the case of pam above, I believe that patching it in a way that sets
an arbitrary high limit when there is none and documenting this as a
limitation could be appropriate.

Cheers,
--
Julien Plissonneau Duquène
Loading...