Discussion:
Bits from DPL
Add Reply
Otto Kekäläinen
2025-01-04 23:30:01 UTC
Reply
Permalink
Hi!
Number of packages not on Salsa
-------------------------------
In my campaign, I stated [os1] that I aimed to reduce the number of
packages maintained outside Salsa to below 2,000. As of March 28, 2024,
the count was 2,368. As of this writing, the count stands at 1,928
[os2], so I consider this promise fulfilled. My thanks go out to
everyone who contributed to this effort. Moving forward, I’d like to set
a more ambitious goal for the remainder of my term and hope we can
reduce the number to below 1,800.
[os1] https://lists.debian.org/debian-vote/2024/03/msg00057.html
SELECT DISTINCT count(*) FROM sources WHERE release = 'sid' and vcs_url not like '%salsa%' ;
For a slightly bigger window of context, note that non-Salsa hosted
packages in 2023 hovered around 2600-2500, and starting from April
2024 it has been going down rapidly, reaching1928 as your latest
figure shows. I am glad to see this progressing.

20230701 2596
20230801 2577
20230901 2549
20231001 2550
20231101 2570
20231201 2564
20240101 2542
20240201 2542
20240301 2538
20240401 2524
20240501 2439
20240601 2349
20240701 2296
20240801 2274
20240901 2221
20241001 2180
20241101 2117
20241201 1990

Hopefully DEP-18 in
https://salsa.debian.org/dep-team/deps/-/merge_requests/12 can be
finalized in early 2025 and help foster a culture of collaboration,
where it is easy to contribute to any package, in which being hosted
on salsa.debian.org is a key advantage.
Andreas Tille
2025-01-07 19:10:02 UTC
Reply
Permalink
Without seeking to rain on the parade, that query is only the packages that
list a non-salsa VCS. That's not counting the packages that don't list a VCS
udd=> SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND
vcs_url IS NULL;
count
-------
2008
That's a very valuable hint. Thank you.
(both SQL "LIKE" and "NOT LIKE" don't match NULL values; there 2030 source
packages in UDD that match but only 2008 distinct ones)
udd=> SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND
(vcs_url IS NULL OR vcs_url NOT LIKE '%salsa%');
count
-------
3906
Lets think about some better fine tuning. "NOT LIKE '%salsa%'" might
catch also Vcs URLs that are intentionally somewhere else. While I'd
love to see all packages on Salsa, it might be sensible to start with
packages that are unintentionally not in Salsa so

udd=> SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND (vcs_url IS NULL OR vcs_url like '%alioth%' OR vcs_url like '%git.debian.org%' OR vcs_url like '%svn.debian.org%') ;
count
-------
2213

That might make a real challenge to bring that number below 2000 until
end of my term. Any help to approach this is welcome.

Thanks again for the hint
Andreas.
--
https://fam-tille.de
Julien Plissonneau Duquène
2025-01-07 20:30:02 UTC
Reply
Permalink
While I'd love to see all packages on Salsa
I think that being able to host the primary git repository of packages
elsewhere is a freedom worth maintaining for many reasons.

What the Debian Project could (and probably should) do in these cases is
to host a read-only mirror of the repository with most features disabled
by default (no issues, no merge requests, no CI unless the maintainers
switch them on), just keeping the possibility to fork the repository.
This would mitigate the risk that the repository just vanishes, maybe
help to alleviate some scaling issues like API rate limits on some
platforms, and make it easier for would-be contributors to maintain a
public fork for the platforms that make it complicated or impossible or
have unreasonable ToS.

Cheers,
--
Julien Plissonneau Duquène
Peter Pentchev
2025-01-07 21:00:01 UTC
Reply
Permalink
Post by Julien Plissonneau Duquène
While I'd love to see all packages on Salsa
I think that being able to host the primary git repository of packages
elsewhere is a freedom worth maintaining for many reasons.
What the Debian Project could (and probably should) do in these cases is to
host a read-only mirror of the repository with most features disabled by
default (no issues, no merge requests, no CI unless the maintainers switch
them on), just keeping the possibility to fork the repository. This would
mitigate the risk that the repository just vanishes, maybe help to alleviate
some scaling issues like API rate limits on some platforms, and make it
easier for would-be contributors to maintain a public fork for the platforms
that make it complicated or impossible or have unreasonable ToS.
Hm. That sounds interesting, but I think the Debian project cannot
protect such a mirror from automatically bringing in non-DFSG content
that appears in the remote repository. One might even take this one step
further and go to content forbidden by law in various jurisdictions.

Having a Git forge where developers (who have manually created accounts and
agreed to terms of use) will always choose what to push and what not to
push takes care of this problem, or at least moves the responsibility on
to the developers themselves. An automatic mirror cannot do that.
(and no, even if one says "well the responsibility is on the developer who
first marked that remote repo for mirroring", no, I don't think there is
a way that developer can know that, two weeks later, somebody will push
bad stuff there)

G'luck,
Peter
--
Peter Pentchev ***@ringlet.net ***@debian.org ***@morpheusly.com
PGP key: https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
Julien Plissonneau Duquène
2025-01-08 09:20:01 UTC
Reply
Permalink
Post by Peter Pentchev
Hm. That sounds interesting, but I think the Debian project cannot
protect such a mirror from automatically bringing in non-DFSG content
that appears in the remote repository. One might even take this one step
further and go to content forbidden by law in various jurisdictions.
Then we are going to have the same issue implementing automated upstream
release imports in packaging repositories, e.g. with the Janitor, and
this is a service I would very much like to have.

I would worry more about malicious content getting automatically pulled
in. But anyway this can probably be mitigated the way large platforms
do: make it possible to easily report abuse and being diligent in
investigating them, eventually putting the repository offline until the
issue is cleared. Additional automated checks could be implemented to
suspend updates and require human review e.g. with LICENSE changes
unless the file contents matches a whitelist.

Alternatively the mirroring could be implemented to pull only the
release tags after a package is uploaded to the archive (which means
that someone reviewed the changes), and dealt with on a case-by-case
basis for non-free packages or packages that have +dfsg repacking.

Cheers,
--
Julien Plissonneau Duquène
Peter Pentchev
2025-01-08 14:40:01 UTC
Reply
Permalink
Post by Julien Plissonneau Duquène
Post by Peter Pentchev
Hm. That sounds interesting, but I think the Debian project cannot
protect such a mirror from automatically bringing in non-DFSG content
that appears in the remote repository. One might even take this one step
further and go to content forbidden by law in various jurisdictions.
Then we are going to have the same issue implementing automated upstream
release imports in packaging repositories, e.g. with the Janitor, and this
is a service I would very much like to have.
Unfortunately you are correct that the same problem would arise.
Post by Julien Plissonneau Duquène
I would worry more about malicious content getting automatically pulled in.
But anyway this can probably be mitigated the way large platforms do: make
it possible to easily report abuse and being diligent in investigating them,
eventually putting the repository offline until the issue is cleared.
Hm, I would be really, really surprised if there was even one "large
platform" that did not shift the responsibility to the user by having
them sign a terms of service document upon account registration.
Also, I'm not sure that some issues can really be cleared; see below.
Post by Julien Plissonneau Duquène
Additional automated checks could be implemented to suspend updates and
require human review e.g. with LICENSE changes unless the file contents
matches a whitelist.
That would put the responsibility on the uploader to review not only
the actual changes (as in a diff) between the releases, but each and every
individual file in each and every commit between the two releases.
I don't think this is completely realistic.

Why each and every individual file? Well, consider this:
- version 3.14.1 is tagged
- version 3.14.1 is uploaded to Debian
- somebody pushes a commit to the upstream repo that adds a file that
really does not belong there
- two more "real" commits are pushed
- somebody pushes a commit that reverts the "add a bad file" one
- three more "real" commits are pushed
- version 3.14.2 is tagged
- version 3.14.2 is uploaded to Debian

...so, if at this point the mirror pulls in the Git commits between
versions 3.14.1 and 3.14.2, there will exist several publicly-accessible
blobs that will contain the file that really does not belong there.
Clearing the issue would require rewriting Git history, squashing commits or
dropping them altogether, which would make the Debian version of
the "upstream" Git repository no longer be a mirror.
Post by Julien Plissonneau Duquène
Alternatively the mirroring could be implemented to pull only the release
tags after a package is uploaded to the archive (which means that someone
reviewed the changes), and dealt with on a case-by-case basis for non-free
packages or packages that have +dfsg repacking.
In Git repositories, pulling the release tag involves pulling (and making
available) all the commits leading up to it, even the reverted ones, so...
see above.

In general, automatically mirroring Git repository content is... fraught
with various issues.

G'luck,
Peter
--
Peter Pentchev ***@ringlet.net ***@debian.org ***@morpheusly.com
PGP key: https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
Luca Boccassi
2025-01-08 15:00:02 UTC
Reply
Permalink
Post by Peter Pentchev
Post by Julien Plissonneau Duquène
Post by Peter Pentchev
Hm. That sounds interesting, but I think the Debian project cannot
protect such a mirror from automatically bringing in non-DFSG content
that appears in the remote repository. One might even take this one step
further and go to content forbidden by law in various jurisdictions.
Then we are going to have the same issue implementing automated upstream
release imports in packaging repositories, e.g. with the Janitor, and this
is a service I would very much like to have.
Unfortunately you are correct that the same problem would arise.
Note that there aren't, and never were, rules concerning DFSG content
and git repositories. In Salsa (and also in its predecessor forge,
Alioth) you can find repositories for packages uploaded to non-free -
firmwares, drivers, etc. And that makes sense, as the rules apply to
the 'main' section of the archive, which is what we ship to users, not
to development infrastructure, otherwise you couldn't upload non-free
packages to buildds to get them built, or deb.debian.org to be
published in the non-free section, and so on.
So it's entirely ok if mirroring brings in non-DFSG content, as long
as packages are uploaded to the appropriate non-free or firmware
sections of the archive.
Peter Pentchev
2025-01-08 17:10:01 UTC
Reply
Permalink
Post by Luca Boccassi
Post by Peter Pentchev
Post by Julien Plissonneau Duquène
Post by Peter Pentchev
Hm. That sounds interesting, but I think the Debian project cannot
protect such a mirror from automatically bringing in non-DFSG content
that appears in the remote repository. One might even take this one step
further and go to content forbidden by law in various jurisdictions.
Then we are going to have the same issue implementing automated upstream
release imports in packaging repositories, e.g. with the Janitor, and this
is a service I would very much like to have.
Unfortunately you are correct that the same problem would arise.
Note that there aren't, and never were, rules concerning DFSG content
and git repositories. In Salsa (and also in its predecessor forge,
Alioth) you can find repositories for packages uploaded to non-free -
firmwares, drivers, etc. And that makes sense, as the rules apply to
the 'main' section of the archive, which is what we ship to users, not
to development infrastructure, otherwise you couldn't upload non-free
packages to buildds to get them built, or deb.debian.org to be
published in the non-free section, and so on.
So it's entirely ok if mirroring brings in non-DFSG content, as long
as packages are uploaded to the appropriate non-free or firmware
sections of the archive.
Right, and that's why in my next sentence I mentioned content that
might actually be illegal and bring trouble to the administrators.

Sorry, the DFSG part was mostly a red herring, although a part of
me does wonder whether putting a file up for download is not
a type of redistribution, but I guess that has already been
discussed many times among the administrators of alioth and salsa.
I am mostly concerned with content that may be viewed as illegal,
in the context of "this was pulled in automatically, there was no
human being who initiated that action, so there is nobody but
the site admins to be held responsible".

G'luck,
Peter
--
Peter Pentchev ***@ringlet.net ***@debian.org ***@morpheusly.com
PGP key: https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
Julien Plissonneau Duquène
2025-01-08 18:20:01 UTC
Reply
Permalink
Post by Peter Pentchev
in the context of "this was pulled in automatically, there was no
human being who initiated that action, so there is nobody but
the site admins to be held responsible".
Actually the chain of responsibility can be traced back to another human
even in that case:
- the mirroring should be activated by an identified individual after
validating that it is permitted by the license of the project and the
ToS (if any) of the remote site
- the remote project is supervised by identified individuals
- the remote project should only allow contributions from identified
individuals (some may even require a formal CLA), and these
contributions must be comply with the licensing of the project, the ToS
of the platform if any, and the code of conduct of the platform if any.

Here "identified" means that they have reasonably stable e-mail
addresses and registered accounts, not necessarily that we know their
real names or anything else about their real identity. That's still
enough for law enforcement to issue search warrants should some serious
wrongdoing happen.

We can probably expect some "interesting" issues in the future with
automated (AI) contributions in the future though ....

BTW I'm not sure that the Debian Machine Usage Policy covers online
services such as Salsa in its current form. This might be worth fixing,
and advertising on the services.

Cheers,
--
Julien Plissonneau Duquène
t***@goirand.fr
2025-01-08 22:40:01 UTC
Reply
Permalink
Post by Peter Pentchev
I am mostly concerned with content that may be viewed as illegal,
in the context of "this was pulled in automatically, there was no
human being who initiated that action, so there is nobody but
the site admins to be held responsible".
Hi,


I don't see why we would make a special case for mirror repos on Salsa. Just like other repos, those maintaining the mirror could be held respossible for what is hosted in the mirror, I suppose. If you fear illegal content may get in as you mentioned, please consider not seting up such a mirror.


Hopefully, no Salsa admin will get into legal troubles because of another person mistake or missbehavior. If ones become aware of abuse, please report it like everything else at /salsa/support on salsa.d.org.


If the source of the mirror is well known and accountable for, and it is your opinion it is useful, do it.


Though a mirror just to pretend your package is hosted in Salsa, with all the tooling like MR desactivated brings IMO no value, and is just a waste of resources.


Cheers,


Thomas Goirand (zigo)


P.S: the above is only my personal opinion, but I suspect other Salsa admins could agree with such common sense. :)
Julien Plissonneau Duquène
2025-01-08 16:40:01 UTC
Reply
Permalink
Post by Peter Pentchev
Hm, I would be really, really surprised if there was even one "large
platform" that did not shift the responsibility to the user by having
them sign a terms of service document upon account registration.
They don't make you sign anything, and most of the time they don't even
make you explicitly accept or read anything. A good example is there:
https://pastebin.com/
(trigger warning: ads and loads of tracking junk and cookie consent
pop-in-your-face featuring dark patterns)

Once you've cleared the cookie stuff, that's it. You can write and paste
and share whatever, nobody will ask for your real name or a government
ID or make you print a form and sign it and scan it and upload it or
review the content of what you publicly share before you share it.

Note that this is not a new service: according to Wikipedia it has been
online since 2002. It's still online. It is fairly certain that it has
seen some of the worst kind of abuse since its inception, yet it's still
there and almost as free to use as when it was created (actually the
worst impediment is probably the mandatory cookie consent stuff).

Now if you pay attention to details, somewhere at the bottom of the
page, in small letters with reduced contrast next to “cookie policy” you
will find a “terms of service” link that brings you to a wall of
legalese prose where you can read ”Please read this Terms of Service
agreement carefully before accessing or using this Website” among other
silly absurdities that are so typical of the legalese of common-law
jurisdictions. It probably says somewhere that by using the service you
consent to these terms, and that you can't use the service to post
illegal or abusive content.

Back to the bottom of the page, you will notice that there are not just
one but three different links that will offer you three different ways
to report abuse: “dmca”, “report abuse” and “contact”.

Then if you click on a random public paste on the right, in the banner
above the shared contents, next to “print” you will find a “report”
button that brings you to a pre-filled report form.

That's all what's needed, and still probably way more than the strict
minimum necessary to CYA. And the contact forms don't even make it
really easy to report abuse as they feature captchas.
Post by Peter Pentchev
Also, I'm not sure that some issues can really be cleared; see below.
Here I'm not sure the perceived issues are that much of an issue. We
would have no Internet today if network and system operators tried to
reach that level of safety back in the eighties and nineties.

Cheers,
--
Julien Plissonneau Duquène
Iustin Pop
2025-01-07 21:20:02 UTC
Reply
Permalink
Post by Julien Plissonneau Duquène
While I'd love to see all packages on Salsa
I think that being able to host the primary git repository of packages
elsewhere is a freedom worth maintaining for many reasons.
No, I don't think this is a good idea, and at my first thought, I
personally don't see any practical reasons.

You can always keep _another_ copy on another git repository, either by
client-side push to both salsa and your server, or by making salsa push
automatically to your server - your choice. But having the main
repository on Salsa for all packages gives tremendous advantages.

So, from my side, +1 to "all Salsa".

regards,
iustin
Otto Kekäläinen
2025-01-08 05:20:01 UTC
Reply
Permalink
Hi,
Post by Iustin Pop
Post by Julien Plissonneau Duquène
While I'd love to see all packages on Salsa
I think that being able to host the primary git repository of packages
elsewhere is a freedom worth maintaining for many reasons.
No, I don't think this is a good idea, and at my first thought, I
personally don't see any practical reasons.
You can always keep _another_ copy on another git repository, either by
client-side push to both salsa and your server, or by making salsa push
automatically to your server - your choice. But having the main
repository on Salsa for all packages gives tremendous advantages.
Yes, exactly: The fact that Debian wants to have your packages on
salsa.debian.org is a huge enabler for collaboration. At the same time
it does not really remove anything from you.

Thanks to how git works as a distributed version control system, you
can still host your code in parallel elsewhere.

For an example of this check out
https://salsa.debian.org/debian/debcraft with mirrors at
https://gitlab.com/ottok/debcraft and
https://github.com/ottok/debcraft. Salsa is the "authorative" place
referenced by the package Vcs-Git field, but there are mirrors on
GitLab and GitHub, and I have recieved Merge Requests and Pull
Requests on all three platforms. It is almost no extra work for having
this. I get an email for MR/PR posted anywhere, so there is just one
inbox for me to monitor.

Once I merge a contribution somewhere I just pull it to my laptop and
the next git push will sync it to all other repos without me having to
do any extra work. As you can see all of them have the same git HEAD
at 7bd51c1d.

For reference, my local checkout has this config:

± git remote -v
origin ***@salsa.debian.org:debian/debcraft.git (fetch)
origin ***@salsa.debian.org:debian/debcraft.git (push)
origin ***@gitlab.com:ottok/debcraft.git (push)
origin ***@github.com:ottok/debcraft.git (push)
otto ***@salsa.debian.org:otto/debcraft.git (fetch)
otto ***@salsa.debian.org:otto/debcraft.git (push)
otto ***@gitlab.com:ottok/debcraft.git (push)
otto ***@github.com:ottok/debcraft.git (push)
otto ***@git.sr.ht:~ottok/debcraft (push)
t***@goirand.fr
2025-01-07 22:30:01 UTC
Reply
Permalink
Post by Julien Plissonneau Duquène
I think that being able to host the primary git repository of packages
elsewhere is a freedom worth maintaining for many reasons.
I don't think we should continue to allow the "freedom" to be annoying for every other contributors. Even if there may be some "technical excuses" to do so.


Thomas Goirand (zigo)
Holger Levsen
2025-01-07 22:50:01 UTC
Reply
Permalink
Post by t***@goirand.fr
Post by Julien Plissonneau Duquène
I think that being able to host the primary git repository of packages
elsewhere is a freedom worth maintaining for many reasons.
same here.
Post by t***@goirand.fr
I don't think we should continue to allow the "freedom" to be annoying for every other contributors. Even if there may be some "technical excuses" to do so.
the same could be said about using cdbs or anything really.

please be careful in your efforts to make contributing easier to not alienate
those who already contribute, sometimes for decades. also: it's rather easy to
kill motivation but very hard to revive it, once killed.
--
cheers,
Holger

⢀⣎⠟⠻⢶⣊⠀
⣟⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
⠈⠳⣄

“It's easy to be a naive idealist. It's easy to be a cynical realist. It's
quite another thing to have no illusions and still hold the inner flame.”
(Marie-Louise von Franz)
Otto Kekäläinen
2025-01-12 04:00:01 UTC
Reply
Permalink
Post by Holger Levsen
please be careful in your efforts to make contributing easier to not alienate
those who already contribute, sometimes for decades. also: it's rather easy to
kill motivation but very hard to revive it, once killed.
The above got quoted in the latest LWN, so it may be a sign that the
above view has a lot of support. I am by far no longer "new" myself,
but I mentor many who are new, and to me it looks like we are
definitely alienating new contributors way more than old.

I would like to argue, that the opposite of Holger's quote holds true
as well: If those who have been contributing for multiple decades
continue to ignore new tools and refuse to adopt workflows invented in
past 20 years, it will totally kill the motivation for a lot of
talented and industrious new contributors.

For example, Michael Stapelberg was very active in Debian in 2009-2019
until he "had enough". His retirement blog post [1] from 2019 raises
the many of the same issues we have been talking (yet again) on this
very debian-devel@ list in past weeks (e.g. some people refusing to
host their packages in git and on a common platform, bugs.debian.org
being too cumbersome to use via email only, project-wide changes being
too hard to drive etc).

Ideally, I'd like to see both the old guard spend some time on
re-assessing their workflows and adopting new ones, AND have new be
humble enough to learning about Debian history and the multitude of
different package types we need to support and why some tooling might
not universally work for good reasons. This is one of the reasons I
like the DEP process: unlike the policy, it does not enforce anything,
but it still provides a way to define shared and common workflows and
interfaces to make collaboration more efficient.

The best outcome would be for both old and new contributors to feel
welcome. Hopefully all the recent work in documentation and DEPs can
lead to a good balance of revival without loosing things that are good
already.

[1] https://michael.stapelberg.ch/posts/2019-03-10-debian-winding-down/
Jeremy Stanley
2025-01-07 22:50:01 UTC
Reply
Permalink
On 2025-01-07 23:21:36 +0100 (+0100), ***@goirand.fr wrote:
[...]
Post by t***@goirand.fr
I don't think we should continue to allow the "freedom" to be
annoying for every other contributors. Even if there may be some
"technical excuses" to do so.
That's fair. I maintain a package of a project where I eventually
moved the upstream codebase into revision control but have been too
lazy/distracted to do the same for the debian directory (which I
realistically only update once every year or two). I'm committed to
importing that into Salsa eventually, it's just a question of when
I'll find the time.
--
Jeremy Stanley
Otto Kekäläinen
2025-01-08 05:10:01 UTC
Reply
Permalink
Post by Jeremy Stanley
That's fair. I maintain a package of a project where I eventually
moved the upstream codebase into revision control but have been too
lazy/distracted to do the same for the debian directory (which I
realistically only update once every year or two). I'm committed to
importing that into Salsa eventually, it's just a question of when
I'll find the time.
Are you aware that you can almost fully automatically create the git
history from existing uploads with git-buildpackage?

gbp import-dscs --verbose --pristine-tar --create-missing-branches <packagename>

Then you just push that as a new repo on Salsa and you are done with
the migration.
Andreas Tille
2025-01-08 16:00:02 UTC
Reply
Permalink
Hi Stuart,
Post by Andreas Tille
Lets think about some better fine tuning. "NOT LIKE '%salsa%'" might
catch also Vcs URLs that are intentionally somewhere else. While I'd
love to see all packages on Salsa, it might be sensible to start with
packages that are unintentionally not in Salsa so
udd=> SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND (vcs_url IS NULL OR vcs_url like '%alioth%' OR vcs_url like '%git.debian.org%' OR vcs_url like '%svn.debian.org%') ;
count
-------
2213
For completeness I need to add `OR vcs_url like '%anonscm.debian.org%'`
which bumps the counter to 2947 ...
Post by Andreas Tille
That might make a real challenge to bring that number below 2000 until
end of my term. Any help to approach this is welcome.
... and my challenge to bring that number below 2000 nearly out of reach
(except if lots of people might subscribe this effort).
Well, let's look at some of these other d.o URLs.
- Not our alioth: There are 16 vcs URLs in there that have 'alioth'
in them but aren't alioth.debian.org; they are git hosted but not
on Debian infrastructure (and perhaps not in a place that facilitates
collaboration in the way being discussed)
Ahhh, you mean https://evolvis.org/anonscm/git/alioth ? Thank you for
the hint. So the updated query is

SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND
(vcs_url IS NULL OR vcs_url like '%alioth.debian.org%' OR vcs_url like '%git.debian.org%' OR vcs_url like '%svn.debian.org%' OR vcs_url like '%anonscm.debian.org%') ;

2930
- dgit.debian.org: There are 30 in there that are dgit.debian.org.
That surprised me, maybe I don't know enough about dgit.
I consider dgit.debian.org a valid Vcs field. It might preferable to
have Salsa the main repostory and dgit.d.o just a clone of this, but for
the moment I'm trying to seek for obviously unmaintained Vcs fields or
no Vcs at all.
- git.debian.org: There are 146 with git.debian.org - none of these VCS
URLs work any more
Yes, that's my point: Fix things that don't work.
- svn.debian.org: !4 list svn.d.o but like git.d.o that's dead. svn.d.o
doesn't even exist as a hostname any more.
Same here.
There's 161 packages in sid with old d.o URLs pointing to alioth. There's a
reasonable chance that a good portion of them are also not maintained.
- 11% of them list their maintainer as Debian QA Group
- 13% of them have a current O bug (another 1 with an RFA)
- who knows how many are otherwise abandoned with MIA maintainers or
maintainers who have just moved on to other things
Spotting obviously broken Vcs fields (or no Vcs fields) is one way to
seek for unmaintained packages. It might turn out that this indicator
is misleading but to my experience from Bug of the Day this is really
a rare exception.
There was a recent discussion about what to do with VCSes for orphaned
packages. Maybe if it doesn't exist on salsa, it's worth creating one in the
salsa.d.o/debian/ namespace as part of doing the QA upload?
(gbp import-dscs --debsnap) That would be a good outcome and a good little
project for someone...
... which I would really welcome but we need "someone" who volunteers.
The vast majority of these packages have seen post-alioth uploads but with
the broken Vcs fields still in place.
Do you have numbers backing up this "vast majority" statement? To my
experience these Uploads where NMUs but not maintainer uploads. This
brings me back to my argument that restrictions on NMUs for acceptable
changes are preventing NMUers to look for such issues. In most cases
where I salvaged packages NMUs where not even pushed to a repository
that might exist on Salsa. So having repositories on Salsa without
doing an upload with fixed Vcs fields (I've seen lots of these with
changelog entries by Janitor) are potentially triggering regressions.
The maintainer might simply continue working on the status of the Git
repository bumping the Debian revision to something higher than the NMU
and the changes of NMU might become lost.
That's perhaps offering the opposite
of collaborative development? The question is whether the repo has actually
moved to salsa but d/control hasn't been updated, or whether the repo has
just vanished. An MBF that the VCS fields are out of date is easy, but
checking and fixing is likely manual work.
Its definitely manual work. In most cases you also have to check the
Homepage and the watch file of the project. My gut feeling is about
30% of the Homepages of the Bug of the Day-salvaged packages were
broken.
year | count
-----+-------
2011 | 1
2012 | 4
2013 | 3
2014 | 4
2015 | 1
2016 | 1
2017 | 2
2018 | 2 (salsa.d.o general availability)
2019 | 1
2020 | 13
2021 | 95
2022 | 20
2023 | 7
2024 | 6
2025 | 1
Most of these until 2019 will be probably fetched by Bug of the Day
sooner or later. Helping hands are always welcome.
I noticed that some teams have some lintian tags checking this from a team
policy perspective - doing this more broadly for other teams would help
provide teams with visibility via lintian.d.o reports.
lintian-explain-tags -t team/pkg-perl/vcs/no-git \
team/pkg-perl/vcs/no-team-url
Nice.
(I accidentally found 2 python-team packages without Vcs URLs yesterday -
the repos were on salsa, just not listed in d/control)
Not so nice. Did you just injected these? If not would you mind naming
the packages?
Over half of these old alioth URLs can be addressed by Teams doing some data
maintainer_name | count
-------------------------------+-------
Debian Perl Group | 72
Debian Java Maintainers | 10
Debian X Strike Force | 7
Debian XML/SGML Group | 4
Debian Science Maintainers | 3
Debian CLI Applications Team | 2
Debian Ruby Extras Maintainers | 1
Debian Javascript Maintainers | 1
Debian Telepathy maintainers | 1
Debian Fonts Task Force | 1
Debian CLI Libraries Team | 1
Debian-IN Team | 1
Debichem Team | 1
NeuroDebian Team | 1
The Debian Lua Team | 1
I find even 13 in Science team and will try to tackle these (or
ask for removal).
( SELECT source, maintainer, vcs_url FROM sources WHERE release = 'sid' AND vcs_url not like '%salsa%' AND maintainer like '%science%' ; )
So in terms of where to start... perhaps there's a couple of teams that
would like to do some data cleansing?
It would be really great if this thread would have
this effect.

Thanks a lot for your analysis
Andreas.



SELECT
s.source, date, vcs_url
FROM
sources AS s
JOIN upload_history AS h
ON s.source = h.source AND s.version = h.version
WHERE
release = 'sid' AND
vcs_url ~ '/(git|svn|alioth).debian.org'
ORDER BY
date DESC;



SELECT
DATE_PART('year', date) AS year,
COUNT(*)
FROM
sources AS s
JOIN upload_history AS h
ON s.source = h.source AND s.version = h.version
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth).debian.org'
GROUP BY
year
ORDER BY
year ASC;



SELECT
maintainer, COUNT(*)
FROM sources
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth).debian.org'
AND maintainer ~ '(team|group|lists)'
GROUP BY
maintainer
ORDER BY
count DESC;
--
https://fam-tille.de
Andreas Tille
2025-01-09 16:40:02 UTC
Reply
Permalink
Hi Stuart,

changing subject and suggest moving the topic to Debian QA list where
it probably belongs.
Good point on anonscm as well... that really does blow out the numbers.
Unfortunately yes.
However... some of them still work via the aliasing mechanism that was
introduced at the time of migration to salsa.
In the migration phase from Alioth to Salsa I maintained lists of
packages for Debian Med and Debian Science team. In my practical
experience finding some working alias is a rare exception. I also think
this alias mechanism was a temporary solution that should not survive
for >5 years.
Duck used to check them all
but I don't think it is running any more, unfortunately. vcswatch still
does, more on that later.
Vcswatch is a good hint.
Post by Andreas Tille
The vast majority of these packages have seen post-alioth uploads but with
the broken Vcs fields still in place.
Do you have numbers backing up this "vast majority" statement?
Yes, that's in the table below. Of those 161 packages, 145 have been
uploaded since salsa launched and alioth stopped. (updated data with anonscm
at the bottom - the story is still the same, although not all those anonscm
links are broken)
Ahhh, got your point now. The Bug of the Day criteria are selecting
packages that are not uploaded for a long time and thus might experience
is different.
Post by Andreas Tille
(I accidentally found 2 python-team packages without Vcs URLs yesterday -
the repos were on salsa, just not listed in d/control)
Not so nice. Did you just injected these? If not would you mind naming
the packages?
One got uploaded because I was sorting other changes for qtpy, the other is
fixed in git. Having looked at 20-something packages in the last 2 days, I'm
not sure I could actually name which ones at this stage...
OK as long as these are fixed now.
In pursuing this, you might also find the vcswatch table in udd - it lists
1533 packages where the VCS fields might need fixing. Some of the errors
there are transient, but this also picks up typos in the VCS fields
('debain', 'debian/packages/') and repos that simply don't exist.
Good point.
Updated queries and data appended. (and btw postgres can do regex matches
which simplifies the sql quite a lot)
I'm aware in principle about the regexp feature. Unfortunately I have
to deal with SQL databases without this kind feature in my day job. So
I usually try to avoid PostgreSQL only features.
SELECT
DATE_PART('year', date) AS year,
COUNT(*)
FROM
sources AS s
JOIN upload_history AS h
ON s.source = h.source AND s.version = h.version
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth|anonscm).debian.org'
GROUP BY
year
ORDER BY
year ASC;
year | count
-----+-------
2011 | 2
2012 | 5
2013 | 7
2014 | 9
2015 | 9
2016 | 20
2017 | 102
2018 | 85 ← (salsa.d.o general availability)
2019 | 10
2020 | 77
2021 | 411
2022 | 115
2023 | 13
2024 | 31
2025 | 3
(15 rows)
Teams with packages to fix - and the packages are probably already on salsa
so this is just metadata, not lots of work.
SELECT
maintainer_name, COUNT(*)
FROM sources
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth|anonscm).debian.org'
AND maintainer ~ '(team|group|lists)'
GROUP BY
maintainer_name
ORDER BY
count DESC;
maintainer_name | count
---------------------------------+-------
Debian Ruby Extras Maintainers | 196 (+2 that are in Uploaders)
Debian Java Maintainers | 178
Debian Go Packaging Team | 105
Debian Perl Group | 83
pkg-go | 25
Debian Javascript Maintainers | 20
Debian Fonts Task Force | 15
Debian PHP PEAR Maintainers | 14
Debian X Strike Force | 12
Debian Science Maintainers | 11
Debian XML/SGML Group | 5
Debichem Team | 4
Debian VDR Team | 4
Debian CLI Applications Team | 2
Debian Games Team | 2
Debian Java maintainers | 2
Debian Tasktools Packaging Team | 2
Debian VoIP Team | 2
Debian Astronomy Maintainers | 2
Debian Privacy Tools Maintainers | 2
Debian Clojure Maintainers | 2
Debian Astronomy Team | 2
Debian Telepathy maintainers | 2
Live Systems Maintainers | 1
The Debian Lua Team | 1
Pulseaudio maintenance team | 1
Android Tools Maintainers | 1
Debian PhotoTools Maintainers | 1
Puppet Package Maintainers | 1
ClamAV Team | 1
Debian-IN Team | 1
Debian CLI Libraries Team | 1
Debian Islamic Maintainers | 1
Debian GNOME Maintainers | 1
Debian Science Team | 1
Debian Sugar Team | 1
Debian GNUKhata Team | 1
Debian Emacs addons team | 1
Debian Med Packaging Team | 1
Debian Salt Team | 1
NeuroDebian Team | 1
Find packages in your favourite team that you want to work on...
SELECT
source, vcs_url
FROM sources
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth|anonscm).debian.org'
AND maintainer ~ 'science'
ORDER BY
source;

Thank you for publishing these data - I hope this will encourage people
to look into this.
The vcswatch table has lots of interesting things... Note that the salsa
error "could not read Username" in the table is not a misconfiguration - it
means that the repo couldn't be obtained anonymously, which could be that it
doesn't exist, or that it needs permissions - both are wrong for Debian.
SELECT
source, url, error
FROM
vcswatch
WHERE
error IS NOT NULL
ORDER BY
source;

I've remove the quotation markers from the SQL queries to enable easy
copy-n-pasting for the readers. I confirm a couple of Debian Science
packages will not show up any more tomorrow (but some are not simple
metadata fixes since a lot has happened on code in Git which does not
build currently - at least I pinged the team in those cases).

Kind regards
Andreas.
--
https://fam-tille.de
Bill Allombert
2025-01-09 21:10:01 UTC
Reply
Permalink
It's great to see more packages being maintained on salsa. I've certainly
noticed that it is making working on packages much simpler.
In my campaign, I stated [os1] that I aimed to reduce the number of
packages maintained outside Salsa to below 2,000. As of March 28, 2024,
the count was 2,368. As of this writing, the count stands at 1,928
[os2], so I consider this promise fulfilled. My thanks go out to
everyone who contributed to this effort. Moving forward, I’d like to set
a more ambitious goal for the remainder of my term and hope we can
reduce the number to below 1,800.
Without seeking to rain on the parade, that query is only the packages that
list a non-salsa VCS. That's not counting the packages that don't list a VCS
Also we should only count as 'maintained on Salsa' packages that are
effectively maintained.

Packages that are on salsa but never uploaded to sid or whose source
code does not match salsa, due to NMU or otherwise should not be
counted.

We do not need more bureaucracy. I will not let packages be removed from
testing just to keep the Salsa repositories in sync.

Cheers,
Bill.
Loading...