Discussion:
Musings about Usernames in adduser and Debian
Add Reply
Marc Haber
2024-11-21 17:50:01 UTC
Reply
Permalink
[writing this with my adduser hat on. I am also in touch with the
maintainers of src:shadow and base-passwd]

Hi,

recently, I have "taken over" the wiki page about UserAccounts and have
put in some history and general thoughts about what Debian thinks about
user names and name restrictions.

https://wiki.debian.org/UserAccounts

I fear that I have opened an especially nasty can of worms by beginning
to do sanity checks in adduser and being pointed towards user name
encoding in that process. Can you help me to bring some sense into this
mess?

I would like to hear your comments. Feel free to directly apply
corrections to the wiki page. I am especially interested in having clear
terminology regarding unicode codepoints, UTF-8, character strings and
byte strings. It is vitally important to be consistent her to avoid
making the mess even worse.

For adduser's next release, I would like to discuss the following
things:

(1)
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)

(2)
If the answer to (1) is "allow UTF-8", should we also do that for system
users? (I think no, we should not)

(2a)
Which UTF-8 subset / code point classes should we allow and which should
we reject? (I don't have an opinion about that)

(3)
I think that 32 characters/bytes (it's the same if we don't allow UTF-8)
is a good limitation for a system user name. But, should we increase
that for regular user names? (I think yes)

(4)
If we decide to relax some of our current requirements, where are the
borders between "normal" user name, one that requires --allow-bad-names
and finally one that requires --allow-all-names? Wouldn't it be
offensive to speakers of some languages that require --allow-bad-names
for their special characters to be allowed on a user name? (no opinion
here that would not break backwards compatibility)

(5)
Is it right to say "the user name in /etc/passwd is UTF-8 encoded" or
should I better say "the user name in /etc/passwd can be UTF-8 encoded"?

(6)
Does it still make sense to give non-UTF-8-locales special handling
(which one?), or can adduser safely assume that any non-ascii locale is
UTF-8? Or must I check for locale and reject UTF-8 user names on
non-UTF-8 locales? (I hope that we can safely assume UTF-8)

(7)
Do the general restrictions for both kinds of user names make sense?
Going forward with this would mean to reject user names that we used to
accept before. (I think we should come close to systemd's ideas)

(8)
I think that our current way to restrict system account names is fine.
Any objections/additions here?

(9)
Should some of this language be in Policy instead of some random wiki
page? Policy is quite short about user names (chapter 9.2) (I think yes)

(10)
What should adduser do regarding subuids? Since I was ignorant about
that concept until a few hours ago, all accounts created by adduser do
have subuids, regardless of being system account or not, while useradd
does not give system accounts subuids.

Greetings
Marc

P.S.: The teams and inviduals working on src:shadow, base-passwd and
adduser would appreciate your help in coding and packaging. You can gt
in touch with all involved parties via
pkg-shadow-***@lists.alioth.debian.org
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Richard Lewis
2024-11-21 22:10:01 UTC
Reply
Permalink
Post by Marc Haber
For adduser's next release, I would like to discuss the following
(1)
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)
would allowing utf-8 enable some of the abuse described at
https://lwn.net/Articles/874951/ ?

as usernames appear in logs and other output (and are passed to all
sorts of commands), it seems a bad idea to be too permissive or to
change from historic practice by default, even though from a user pov it
would be nice to have the option
Post by Marc Haber
P.S.: The teams and inviduals working on src:shadow, base-passwd and
adduser would appreciate your help in coding and packaging.
Is there a list of "things that need doing"?
Marc Haber
2024-11-22 09:40:01 UTC
Reply
Permalink
Post by Richard Lewis
Post by Marc Haber
For adduser's next release, I would like to discuss the following
(1)
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)
would allowing utf-8 enable some of the abuse described at
https://lwn.net/Articles/874951/ ?
as usernames appear in logs and other output (and are passed to all
sorts of commands), it seems a bad idea to be too permissive or to
change from historic practice by default, even though from a user pov it
would be nice to have the option
I am not sure about that. Would typosquatting on a user name make sense?
It might be possible to make logs ambiguious. Being passed to other
commands SHOULD not be dangerous since we can expect other commands to
gracefully handle a byte stream, can't we?

I might be naive here , but I don't have much experience with non-ascii
names since I have the privilege of being fluent in two languages that
use the latin alphabet.

On the other side, wouldnt it be a courtesy to allow people having a
name that needs transcription to be written in latin to use their name
in the real alphabet that it is usually written in as a login name as
well? To make things worse, transcriptions are often ambigious.

I would like to hear the opinion of people who would be affected by this
change.

Local Administrators are able today to use UTF-8 user names in useradd
or configure adduser to allow their locally important subset of UTF-8,
but at the moment with things being more restrictive, our software is
untested in this regard. I think that Debian would get more robust if
we'd allow things here.

Vulnerabilities that could be exploited by having non-ascii user names
are already here and present today, just not uncovered yet.
Post by Richard Lewis
Post by Marc Haber
P.S.: The teams and inviduals working on src:shadow, base-passwd and
adduser would appreciate your help in coding and packaging.
Is there a list of "things that need doing"?
The collaboration between src:shadow, base-passwd and adduser is a
relatively fresh thing that came from the fact that src:shadow recently
introduced changes that made adduser's test suite break. So we haven't
yet found good paths yet. I suggested moving together as a method to
improve communication and also to at least a bit reducing the bus
factors of those quite important packages. That was also the reason why
I suggested base-passwd to join and I am happy that Colin agreed.

In adduser, nearly everything that needs doing has issues in the BTS,
with the severity set to the urgency of the matter in my opinion. You'll
see that adduser has quite a lot of bugs that were filed by myself. I
consider it a feature to have a public to-do list. For the other two
packages, I'd let their respective maintainers comment.

Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Iustin Pop
2024-11-21 22:30:01 UTC
Reply
Permalink
Post by Marc Haber
[writing this with my adduser hat on. I am also in touch with the
maintainers of src:shadow and base-passwd]
Hi,
recently, I have "taken over" the wiki page about UserAccounts and have
put in some history and general thoughts about what Debian thinks about
user names and name restrictions.
https://wiki.debian.org/UserAccounts
I fear that I have opened an especially nasty can of worms by beginning
to do sanity checks in adduser and being pointed towards user name
encoding in that process. Can you help me to bring some sense into this
mess?
I would like to hear your comments. Feel free to directly apply
corrections to the wiki page. I am especially interested in having clear
terminology regarding unicode codepoints, UTF-8, character strings and
byte strings. It is vitally important to be consistent her to avoid
making the mess even worse.
For adduser's next release, I would like to discuss the following
(1)
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)
You weren't clear to which part you agreed. If by "we should" you meant
the closest option, i.e. restrict, then I agree as well.

As Richard also replied, full UTF-8 is tricky, and I think it's somewhat
misplaced to focus on the username, as opposed to gecos. Aren't most
other OSes using the "full name" as the "display name", and the username
is mostly one part of the user/password combination, but not a display
property most of the time?

So I would suggest that maybe the better option is to standardise the
gecos format/gecos parsing, so migrate UI tools to use that more often.

On the other hand, as long as this is admin-controlled, it doesn't
matter much. I could see that viewpoint, but I wonder how much latent
breakage would be introduced that will take years to fix in all tooling
and all packages.

regards,
iustin
Marc Haber
2024-11-22 09:50:01 UTC
Reply
Permalink
[Reducing the list to debian-devel. I have omitted to set Reply-To and
apologize for that]
Post by Iustin Pop
Post by Marc Haber
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)
You weren't clear to which part you agreed. If by "we should" you meant
the closest option, i.e. restrict, then I agree as well.
I am sorry. My personal opinions were among the last things I added to
the article and I was not clear here. I think we should allow UTF-8 user
names as a courtesy to those people who need non-ascii user names to
write their name, since user names are frequently chosen from the real
name of the person. In addition, this will enhance software quality
since we now get the chance of finding bugs that are already here in
many software.

This comes kind of late in the Trixie cycle, but as it is currently
already possible to create user names with UTF-8 characters, I do not
like the idea of tightening our restrictions in Trixie over what we have
in Bookworm just to maybe revisit our decision in Trixie+1.
Post by Iustin Pop
As Richard also replied, full UTF-8 is tricky,
My current code uses \p{Graph} as a least common denominator. I am not
sure whether this is wise.
Post by Iustin Pop
and I think it's somewhat
misplaced to focus on the username, as opposed to gecos. Aren't most
other OSes using the "full name" as the "display name", and the username
is mostly one part of the user/password combination, but not a display
property most of the time?
I think that we should allow full UTF-8 in the gecos¹ field, yes. People
should be allowed to have their fully correct name in there. I also
think that users of non-latin languages should have the possibility to
have a login name that resembles their name.

¹ in 2024 noone remembers what gecos means any more. Adduser and
src:shadow are using "comment" for that field nowadays.
Post by Iustin Pop
So I would suggest that maybe the better option is to standardise the
gecos format/gecos parsing, so migrate UI tools to use that more often.
That doesn't solve the issue I am having with adduser right now: That
we're allowing things that we are not sure we should allow.
Post by Iustin Pop
On the other hand, as long as this is admin-controlled, it doesn't
matter much. I could see that viewpoint, but I wonder how much latent
breakage would be introduced that will take years to fix in all tooling
and all packages.
Yes. Fixing breakage makes software better, and by disallowing non-latin
characters in user names we are hiding those issues away.

Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
Loading...