Hi Stuart,
Post by Andreas TilleLets think about some better fine tuning. "NOT LIKE '%salsa%'" might
catch also Vcs URLs that are intentionally somewhere else. While I'd
love to see all packages on Salsa, it might be sensible to start with
packages that are unintentionally not in Salsa so
udd=> SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND (vcs_url IS NULL OR vcs_url like '%alioth%' OR vcs_url like '%git.debian.org%' OR vcs_url like '%svn.debian.org%') ;
count
-------
2213
For completeness I need to add `OR vcs_url like '%anonscm.debian.org%'`
which bumps the counter to 2947 ...
Post by Andreas TilleThat might make a real challenge to bring that number below 2000 until
end of my term. Any help to approach this is welcome.
... and my challenge to bring that number below 2000 nearly out of reach
(except if lots of people might subscribe this effort).
Well, let's look at some of these other d.o URLs.
- Not our alioth: There are 16 vcs URLs in there that have 'alioth'
in them but aren't alioth.debian.org; they are git hosted but not
on Debian infrastructure (and perhaps not in a place that facilitates
collaboration in the way being discussed)
Ahhh, you mean https://evolvis.org/anonscm/git/alioth ? Thank you for
the hint. So the updated query is
SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND
(vcs_url IS NULL OR vcs_url like '%alioth.debian.org%' OR vcs_url like '%git.debian.org%' OR vcs_url like '%svn.debian.org%' OR vcs_url like '%anonscm.debian.org%') ;
2930
- dgit.debian.org: There are 30 in there that are dgit.debian.org.
That surprised me, maybe I don't know enough about dgit.
I consider dgit.debian.org a valid Vcs field. It might preferable to
have Salsa the main repostory and dgit.d.o just a clone of this, but for
the moment I'm trying to seek for obviously unmaintained Vcs fields or
no Vcs at all.
- git.debian.org: There are 146 with git.debian.org - none of these VCS
URLs work any more
Yes, that's my point: Fix things that don't work.
- svn.debian.org: !4 list svn.d.o but like git.d.o that's dead. svn.d.o
doesn't even exist as a hostname any more.
Same here.
There's 161 packages in sid with old d.o URLs pointing to alioth. There's a
reasonable chance that a good portion of them are also not maintained.
- 11% of them list their maintainer as Debian QA Group
- 13% of them have a current O bug (another 1 with an RFA)
- who knows how many are otherwise abandoned with MIA maintainers or
maintainers who have just moved on to other things
Spotting obviously broken Vcs fields (or no Vcs fields) is one way to
seek for unmaintained packages. It might turn out that this indicator
is misleading but to my experience from Bug of the Day this is really
a rare exception.
There was a recent discussion about what to do with VCSes for orphaned
packages. Maybe if it doesn't exist on salsa, it's worth creating one in the
salsa.d.o/debian/ namespace as part of doing the QA upload?
(gbp import-dscs --debsnap) That would be a good outcome and a good little
project for someone...
... which I would really welcome but we need "someone" who volunteers.
The vast majority of these packages have seen post-alioth uploads but with
the broken Vcs fields still in place.
Do you have numbers backing up this "vast majority" statement? To my
experience these Uploads where NMUs but not maintainer uploads. This
brings me back to my argument that restrictions on NMUs for acceptable
changes are preventing NMUers to look for such issues. In most cases
where I salvaged packages NMUs where not even pushed to a repository
that might exist on Salsa. So having repositories on Salsa without
doing an upload with fixed Vcs fields (I've seen lots of these with
changelog entries by Janitor) are potentially triggering regressions.
The maintainer might simply continue working on the status of the Git
repository bumping the Debian revision to something higher than the NMU
and the changes of NMU might become lost.
That's perhaps offering the opposite
of collaborative development? The question is whether the repo has actually
moved to salsa but d/control hasn't been updated, or whether the repo has
just vanished. An MBF that the VCS fields are out of date is easy, but
checking and fixing is likely manual work.
Its definitely manual work. In most cases you also have to check the
Homepage and the watch file of the project. My gut feeling is about
30% of the Homepages of the Bug of the Day-salvaged packages were
broken.
year | count
-----+-------
2011 | 1
2012 | 4
2013 | 3
2014 | 4
2015 | 1
2016 | 1
2017 | 2
2018 | 2 (salsa.d.o general availability)
2019 | 1
2020 | 13
2021 | 95
2022 | 20
2023 | 7
2024 | 6
2025 | 1
Most of these until 2019 will be probably fetched by Bug of the Day
sooner or later. Helping hands are always welcome.
I noticed that some teams have some lintian tags checking this from a team
policy perspective - doing this more broadly for other teams would help
provide teams with visibility via lintian.d.o reports.
lintian-explain-tags -t team/pkg-perl/vcs/no-git \
team/pkg-perl/vcs/no-team-url
Nice.
(I accidentally found 2 python-team packages without Vcs URLs yesterday -
the repos were on salsa, just not listed in d/control)
Not so nice. Did you just injected these? If not would you mind naming
the packages?
Over half of these old alioth URLs can be addressed by Teams doing some data
maintainer_name | count
-------------------------------+-------
Debian Perl Group | 72
Debian Java Maintainers | 10
Debian X Strike Force | 7
Debian XML/SGML Group | 4
Debian Science Maintainers | 3
Debian CLI Applications Team | 2
Debian Ruby Extras Maintainers | 1
Debian Javascript Maintainers | 1
Debian Telepathy maintainers | 1
Debian Fonts Task Force | 1
Debian CLI Libraries Team | 1
Debian-IN Team | 1
Debichem Team | 1
NeuroDebian Team | 1
The Debian Lua Team | 1
I find even 13 in Science team and will try to tackle these (or
ask for removal).
( SELECT source, maintainer, vcs_url FROM sources WHERE release = 'sid' AND vcs_url not like '%salsa%' AND maintainer like '%science%' ; )
So in terms of where to start... perhaps there's a couple of teams that
would like to do some data cleansing?
It would be really great if this thread would have
this effect.
Thanks a lot for your analysis
Andreas.
SELECT
s.source, date, vcs_url
FROM
sources AS s
JOIN upload_history AS h
ON s.source = h.source AND s.version = h.version
WHERE
release = 'sid' AND
vcs_url ~ '/(git|svn|alioth).debian.org'
ORDER BY
date DESC;
SELECT
DATE_PART('year', date) AS year,
COUNT(*)
FROM
sources AS s
JOIN upload_history AS h
ON s.source = h.source AND s.version = h.version
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth).debian.org'
GROUP BY
year
ORDER BY
year ASC;
SELECT
maintainer, COUNT(*)
FROM sources
WHERE
release = 'sid'
AND vcs_url ~ '/(git|svn|alioth).debian.org'
AND maintainer ~ '(team|group|lists)'
GROUP BY
maintainer
ORDER BY
count DESC;
--
https://fam-tille.de