Discussion:
Upstreams with "official" tarballs differing from their git
Add Reply
Stéphane Glondu
2025-02-15 11:20:01 UTC
Reply
Permalink
Hi,
Usually, projects using this trick publish tarballs with
substitutions
   https://github.com/LPCIC/elpi/releases/tag/v2.0.7
(here, elpi-2.0.7.tbz).
I realize my previous email was a bit short: I was wondering if this
.tbz still source code because in the autotools world, package sources
come with configure scripts ready to run, but the good practice in
Debian is to regenerate those from configure.ac.
Well, we enter a philosophical debate that is not specific to OCaml and
probably should be discussed elsewhere... Adding debian-devel to get
more opinions.

Summary to other debian-devel readers: we are facing some upstreams that
publish "official" tarballs that differ from what is in their git. The
differences may include: variable substitutions, generated files... I
guess this is pretty common (cf. autotools). Moreover, the build system
behaves differently if it is called from git or not, or from extracted
official tarballs or not.

IMHO, traditionnaly, "source code" from Debian point of view is whatever
upstream releases as "official" tarballs (i.e. elpi-2.0.7.tbz), which
may differ from what is in upstream git (i.e. v2.0.7.tar.gz). What makes
me think that is the special care that is taken in keeping upstream
tarballs pristine (with their signatures...).

Some may consider that this Debian notion of "source code" differs from
the GNU "preferred form of modification", which would rather be what is
in upstream git... or is it? In Debian, we "modify" upstreams by
applying patches on top of them, so I argue that using "official"
tarballs is fine as long as patches used in Debian packaging apply on
upstream git as well.

Anyway, I do think the "GNU" source should be recoverable from the
"Debian" source. Technically, this is usually not the case with variable
substitutions, but IMHO it's acceptable to use the substituted sources
most of the time.

On the other hand, insisting on using upstream VCS contents can lead to
ugly hacks in Debian packaging, such as what you are describing. I must
admit I usually use "official" tarballs to avoid these hacks (and maybe
a little out of laziness).
I fixed the elpi package by using something a bit hackish: I added git
as dep, and if I don't see a .git in the build directory, I create one!
if test -f .git; then \
@echo "Found .git, ok"; \
else \
touch .false_git; \
git init --initial-branch=main; \
git config user.name "Foo Bar"; \
git add dune; \
git commit -m foo; \
git tag -a v$(DEB_VERSION_UPSTREAM) -m foo; \
fi
if test -f .false_git; then \
rm -rf .false_git .git; \
fi
This is ugliness to paper over ugliness. Please do not!

My approach to this specific problem would be to add to dune the
possibility to use some configuration file (or environment variables) as
input for the substitutions, instead of directly querying the VCS. This
configuration could then be set up as part of the Debian packaging.
I suppose dh's ocaml_dune building tool could do that trick itself
(using debian/fake_dot_git instead of .false_git, better
user.email/user.name...), and that would make sure we don't break.
In case it is not clear: I will oppose this ending up in dh-ocaml.
However, dh-ocaml would be the right place to add support for the
approach I outlined above.
What do you think about the topic?
My e-mail is very opinionated, I would really like to hear other opinions.


Cheers,
--
Stéphane
Daniel Gröber
2025-02-15 11:40:01 UTC
Reply
Permalink
Post by Stéphane Glondu
I fixed the elpi package by using something a bit hackish: I added git
as dep, and if I don't see a .git in the build directory, I create one!
if test -f .git; then \
@echo "Found .git, ok"; \
else \
touch .false_git; \
git init --initial-branch=main; \
git config user.name "Foo Bar"; \
git add dune; \
git commit -m foo; \
git tag -a v$(DEB_VERSION_UPSTREAM) -m foo; \
fi
if test -f .false_git; then \
rm -rf .false_git .git; \
fi
This is ugliness to paper over ugliness. Please do not!
My approach to this specific problem would be to add to dune the possibility
to use some configuration file (or environment variables) as input for the
substitutions, instead of directly querying the VCS. This configuration
could then be set up as part of the Debian packaging.
FYI: If all upstream wants is git metadata I like to introduce them to the
wonderful, but obscure, git `export-subst` feature. See git-attributes(1).

Works with forges, git-archive and everything.

Example:
https://github.com/YosysHQ/yosys/commit/222e7a2da345f01980d9261c40c5d50eced4f9ab
thoug this was later improved by others
https://github.com/YosysHQ/yosys/commit/9d15f1d6ac4a9ff2e1f87cda8c366659027fb76f

If that's not enough can you point us to what this upstream is doing exactly?

--Daniel
Jeremy Stanley
2025-02-15 14:30:01 UTC
Reply
Permalink
On 2025-02-15 12:33:16 +0100 (+0100), Daniel Gröber wrote:
[...]
Post by Daniel Gröber
FYI: If all upstream wants is git metadata I like to introduce them to the
wonderful, but obscure, git `export-subst` feature. See git-attributes(1).
Works with forges, git-archive and everything.
https://github.com/YosysHQ/yosys/commit/222e7a2da345f01980d9261c40c5d50eced4f9ab
thoug this was later improved by others
https://github.com/YosysHQ/yosys/commit/9d15f1d6ac4a9ff2e1f87cda8c366659027fb76f
If that's not enough can you point us to what this upstream is doing exactly?
I'm not that particular upstream, but with my upstream hat for other
projects on, there's a lot of data in a Git repository that our
source tarball build process relies on but isn't strictly files in
the checked-out worktree: names and addresses of all commit authors,
most recent tag in the checkout's history and number of commits
following it, presence of certain footer lines in commit messages
after the most recent tag, tags in the checkout's history matching a
specific pattern which come immediately after the introduction of
each file in a certain directory... these are things easily queried
from a (non-shallow) clone of the repository but which aren't simple
string substitutions.

There are several reasons for this complexity: First and foremost,
when these projects started Git was still fairly new on the scene,
and most distros preferred or even required source tarballs for
packaging. Second, the projects' maintainers were burned on multiple
occasions by mistakes where metadata duplicated from Git committed
into the file tree ended up out of sync or straddling release
points, so developed ways to avoid the duplication and risk of
divergence by extracting that data from Git at dist build time.
Third, because the projects needed to deal with heavy volumes of
development activity from many contributors in parallel, they relied
on a distributed parallel approval model with the fewest possible
coordination chokepoints, so needed to support independent features
merging in arbitrary order with things like release notes sorted out
automatically whenever a release got tagged.

I would argue that our source tarballs don't exactly "differ" from
what's in Git; they include content which isn't solely represented
by the worktree files in a corresponding checkout, but is still data
extracted from the corresponding Git repository state. Downstream
distros can choose to use our official signed release source
tarballs, or run the tarball build process themselves from a full
checkout of our Git repositories, but just naively dumping the file
tree from a git checkout or even using a shallow clone is inadequate
and we expressly do not support those workflows (if someone insists
on doing that, it's on them to make it work, and to check that
they're not omitting things the copyright license references such as
a generated authors file).

Unfortunately, package maintainers sometimes like to insist that
upstream projects' workflows are "wrong" because the choices they've
made differ from how other projects might choose to develop
software, but communities are unique and often face different
challenges that need their own solutions or aren't willing to
compromise by adopting partial solutions popular elsewhere just to
conform.
--
Jeremy Stanley
kpcyrd
2025-02-15 12:50:01 UTC
Reply
Permalink
Post by Stéphane Glondu
I realize my previous email was a bit short: I was wondering if this
.tbz still source code because in the autotools world, package sources
come with configure scripts ready to run, but the good practice in
Debian is to regenerate those from configure.ac.
Well, we enter a philosophical debate that is not specific to OCaml and
probably should be discussed elsewhere... Adding debian-devel to get
more opinions.
Summary to other debian-devel readers: we are facing some upstreams that
publish "official" tarballs that differ from what is in their git. The
differences may include: variable substitutions, generated files... I
guess this is pretty common (cf. autotools). Moreover, the build system
behaves differently if it is called from git or not, or from extracted
official tarballs or not.
IMHO, traditionnaly, "source code" from Debian point of view is whatever
upstream releases as "official" tarballs (i.e. elpi-2.0.7.tbz), which
may differ from what is in upstream git (i.e. v2.0.7.tar.gz). What makes
me think that is the special care that is taken in keeping upstream
tarballs pristine (with their signatures...).
[...]
What do you think about the topic?
My e-mail is very opinionated, I would really like to hear other opinions.
hello! ✨

disclaimer upfront, I know pretty much nothing about ocaml, this is
based on my experience with C/Rust/Go/etc.

I think the concept of "building the source code into source code" [sic]
that is common with autotools, is just the regular build in a trenchcoat
and should happen on Debian build servers. This is to avoid forcing a
gap between the VCS and Reproducible Builds that nobody feels
responsible for. Coincidentally this topic was also discussed in
#reproducible-builds irc yesterday.

With regards to signatures, quoting from an email I wrote briefly after
Post by Stéphane Glondu
It's from the old mindset of code signing being the only way of
securely getting code from upstream. Recent events have shown (instead
of bothering upstream for signatures) it's much more important to have
clarity and transparency what's in the code that is compiled into
binaries and executed on our computers, instead of who we got it from.
The entire reproducible builds effort is based on the idea of the source
code in Debian being safe and sound to use.

[0]: https://lists.debian.org/debian-devel/2024/04/msg00125.html

I know Debian attempts to regenerate the autotools files, but there is
no way to tell if this actually worked, I vaguely remembered XZ was
specifically one of the cases where it didn't.

In other news, note there's currently a push within Arch Linux to move
away from upstream custom tarballs towards VCS snapshots:

https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/46

Also because people found this interesting yesterday, Arch Linux and
Debian disagree on "what's the source code of curl 8.12.1":

Arch Linux:
https://whatsrc.org/artifact/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130
Debian:
https://whatsrc.org/artifact/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac

Diff between those two:

https://whatsrc.org/diff-right-trimmed/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac

Even if we got some kind human to review the source code in entirety for
us, which one should they review?
sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130?
sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac?
Both?

cheers,
kpcyrd
Julien Plissonneau Duquène
2025-02-15 16:00:01 UTC
Reply
Permalink
Hi,
Post by kpcyrd
https://whatsrc.org/diff-right-trimmed/sha256:146d2d673358b7927d9a3c74e22b6b0e7f9a1aee2a4307afbe6ac07f12764130/sha256:599ff98cbab933a8b3640a084b12a5308a20795c192855ee454a8c1c16fa4dac
That divergence is reflected between the official upstream tarballs and
the tarballs that can be downloaded from their GitHub releases page.

There is a similar issue with Gradle where there are official "source
distributions" ZIPs [1] (generated by gradle as part of its build) and
not-really-that-much-less-official release tarballs [2] automatically
generated by GitHub. The divergence between both is shown at [3].

In the case of Gradle, as the official source ZIPs are lacking
documentation files that I think are important (e.g. LICENSE) I switched
to using GitHub tarballs, and will probably switch again to use uscan's
mode=git as it's simpler to configure and more reliable for large
projects that release often. That mode could end up being adopted by the
Java Team for all projects hosted on GitHub for that reason, and also
because it seems easier to convince upstream projects to sign their
release tags than to sign their release tarballs. In this specific case
however I also believe that the upstream project should fix their source
archives generation.

In the case of curl, I believe that for Debian using the official
tarball is fine. If I had to use their upstream git repo for Debian
packaging, I would probably write a small custom script that runs their
maketgz script and can be used in the watch file to regenerate
official-like "original" tarballs the same way as upstream.
Post by kpcyrd
My approach to this specific problem would be to add to dune the
possibility to use some configuration file (or environment variables)
as input for the substitutions, instead of directly querying the VCS.
This configuration could then be set up as part of the Debian
packaging.
I think that this is a reasonable approach that would make the build a
lot less brittle, and that should be submitted upstream.

Cheers,


[1]: https://services.gradle.org/distributions/
[2]: https://github.com/gradle/gradle/tags
[3]:
https://salsa.debian.org/jpd/gradle/-/blob/287ae5c99790f266e242964321955f7c77f397df/debian/wip/delta-ghtar-gradlezip
--
Julien Plissonneau Duquène
Julien Puydt
2025-02-15 16:10:01 UTC
Reply
Permalink
Hi,
Post by Stéphane Glondu
My approach to this specific problem would be to add to dune the
possibility to use some configuration file (or environment variables)
as input for the substitutions, instead of directly querying the
VCS. This configuration could then be set up as part of the Debian
packaging.
Now that you mention it, python-setuptools-scm has a
SETUPTOOL_SCM_PRETEND_VERSION environment variable which does precisely
that.

If you search:
https://codesearch.debian.net/search?q=SETUPTOOLS_SCM_PRETEND_VERSION&literal=1&perpkg=1
and look at the debian/rules packages, you'll see there are quite a few
packages using this.

I just reported it as a feature request on the dune (the culprit OCaml
package builder) :

https://github.com/ocaml/dune/issues/11484


Cheers,

J.Puydt
Marco d'Itri
2025-02-15 17:50:02 UTC
Reply
Permalink
On the other hand, insisting on using upstream VCS contents can lead to ugly
hacks in Debian packaging, such as what you are describing. I must admit I
usually use "official" tarballs to avoid these hacks (and maybe a little out
of laziness).
In my own packages I am happy to add as many ugly hacks are needed to be
able to directly use the upstream repository as upstream. e.g.: varnish.
--
ciao,
Marco
Sean Whitton
2025-02-16 02:50:01 UTC
Reply
Permalink
Hello,
Post by Stéphane Glondu
Summary to other debian-devel readers: we are facing some upstreams that
publish "official" tarballs that differ from what is in their git. The
differences may include: variable substitutions, generated files... I guess
this is pretty common (cf. autotools). Moreover, the build system behaves
differently if it is called from git or not, or from extracted official
tarballs or not.
IMHO, traditionnaly, "source code" from Debian point of view is whatever
upstream releases as "official" tarballs (i.e. elpi-2.0.7.tbz), which may
differ from what is in upstream git (i.e. v2.0.7.tar.gz). What makes me think
that is the special care that is taken in keeping upstream tarballs pristine
(with their signatures...).
Some may consider that this Debian notion of "source code" differs from the
GNU "preferred form of modification", which would rather be what is in
upstream git... or is it? In Debian, we "modify" upstreams by applying patches
on top of them, so I argue that using "official" tarballs is fine as long as
patches used in Debian packaging apply on upstream git as well.
Anyway, I do think the "GNU" source should be recoverable from the "Debian"
source. Technically, this is usually not the case with variable substitutions,
but IMHO it's acceptable to use the substituted sources most of the time.
On the other hand, insisting on using upstream VCS contents can lead to ugly
hacks in Debian packaging, such as what you are describing. I must admit I
usually use "official" tarballs to avoid these hacks (and maybe a little out
of laziness).
I think that basing our work on upstream Git makes our source packages
more useful, and more accurately reflects our commitment to providing
the preferred form of modification for everything in our archive.

If our work is based on upstream Git then users can clone source
packages from salsa (or, better, 'dgit clone' if the maintainer has used
'dgit push-source') and can use powerful tools like 'git blame' and 'git
bisect' to understand their bug.

With tarballs the granularity of these tools is so much less.
--
Sean Whitton
Colin Watson
2025-02-16 15:00:02 UTC
Reply
Permalink
Post by Sean Whitton
I think that basing our work on upstream Git makes our source packages
more useful, and more accurately reflects our commitment to providing
the preferred form of modification for everything in our archive.
If our work is based on upstream Git then users can clone source
packages from salsa (or, better, 'dgit clone' if the maintainer has used
'dgit push-source') and can use powerful tools like 'git blame' and 'git
bisect' to understand their bug.
With tarballs the granularity of these tools is so much less.
This is a false dichotomy, though. It's perfectly possible to use both
in conjunction with each other, by importing a tarball on top of an
upstream git tag so that the differences between them are represented by
a git commit. There are various tools in Debian to help with this.
--
Colin Watson (he/him) [***@debian.org]
Julien Puydt
2025-02-16 16:10:01 UTC
Reply
Permalink
This is a false dichotomy, though.  It's perfectly possible to use
both
in conjunction with each other, by importing a tarball on top of an
upstream git tag so that the differences between them are represented by
a git commit.  There are various tools in Debian to help with this.
Actually, it's a bit more complex.

If you use autotools, you start with configure.ac and .in files. If
upstream prepared the tree (as they generally do), you also get a
libtool/configure, etc. From this, you can run ./configure && make &&
make install ; that will do substitutions in the .in files. And if you
want to start anew, you can use autoconf/autoheader/whatever to re-
create the configure/libtool script and then ./configure && make &&
make install. The .in files are at hand to start again, we have the
developer sources and more.

So at this point, no real dichotomy on what you use as source.

But in the case at hand, the 'dune' tool to configure&compile the
'elpi' software (and others), things work differently. If you go here:
https://github.com/LPCIC/elpi/tags and get the .tar.gz, you'll get an
extract of the git tree, which lacks the versioning information, but
has the %%VERSION_NUM%% (and others) ready for substitution. That's
what I want to use. So my problem was to force-feed the missing git
information to dune so it can actually makes those substitutions.

Stéphane's suggestion was to use the .tbz taken here:
https://github.com/LPCIC/elpi/releases/tag/v2.0.7
where the substitution have already been done. But contrary to the
autotools situation, there's no going back: the substitutions are not
just ready to be applied, they are made, done and gone! If I wanted to
re-version as 2.0.7-debian or whatever, that isn't a possibility:
%%VERSION_NUM%% doesn't appear anymore in the tree. The versioning
information is soldered all other the place.

So here there is a clear dichotomy: depending on the tarball, you don't
have the same.

I'm hopeful dune upstream will accept my proposition and provide a
force-feeding mechanism to use the real source tree painlessly, because
I really think considering this .tbz a source tarball is incorrect.

I have always considered the true way to know if you have the source of
a package is: imagine you're stuck somewhere with a Debian mirror and
no external link. Could you take over the development of the package at
hand?

Cheers,

J.Puydt
Simon Josefsson
2025-02-16 17:40:01 UTC
Reply
Permalink
Post by Julien Puydt
If you use autotools, you start with configure.ac and .in files. If
upstream prepared the tree (as they generally do), you also get a
libtool/configure, etc. From this, you can run ./configure && make &&
make install ; that will do substitutions in the .in files. And if you
want to start anew, you can use autoconf/autoheader/whatever to re-
create the configure/libtool script and then ./configure && make &&
make install.
This is a long-standing misunderstanding in the Debian community. There
is no guarantee that autoconf/autoheader/whatever re-create all
generated autotools files. In fact, there are several examples of
situations where they are not re-generated (e.g., modified aclocal *.m4
files without bumping serial number) and this is intentional upstream
behaviour from autotools and there are no signs that this will change.

To be certain to not get pre-generated files, one approach is to not use
tarballs with pre-generated content at all, but to insist on packaging
autotools projects based on git content; few upstreams commit generated
files into git so this ought to be the safest approach. Look at
'inetutils' in Debian for an example. Another is to manually prepare a
'rm' list of files to remove in debian/rules, or to use debian/copyright
Files-Excluded; to remove all generated autotools files.

/Simon
Julien Plissonneau Duquène
2025-02-16 19:00:01 UTC
Reply
Permalink
Hi,
Post by Julien Puydt
I have always considered the true way to know if you have the source of
a package is: imagine you're stuck somewhere with a Debian mirror and
no external link. Could you take over the development of the package at
hand?
Written like that it looks dramatic, but in that project there are only
5 occurences of %%VERSION_NUM%%. That's only 4 too many, and then it
would not be much different than many other project releases where a
script (not always provided), a bot (same) or plain old manual
intervention (not always fully documented) is used to adjust a hardcoded
version number in some file somewhere at release time. But it's true
that if you're stranded somewhere you may have some free time at hand to
implement your own layer of templating ^ ^.

Cheers,
--
Julien Plissonneau Duquène
Julien Puydt
2025-02-16 20:10:01 UTC
Reply
Permalink
Le dimanche 16 février 2025 à 19:50 +0100, Julien Plissonneau Duquène a
Post by Julien Plissonneau Duquène
Hi,
Post by Julien Puydt
I have always considered the true way to know if you have the source of
a package is: imagine you're stuck somewhere with a Debian mirror and
no external link. Could you take over the development of the
package at
hand?
Written like that it looks dramatic, but in that project there are only
5 occurences of %%VERSION_NUM%%. That's only 4 too many, and then it
would not be much different than many other project releases where a
script (not always provided), a bot (same) or plain old manual
intervention (not always fully documented) is used to adjust a
hardcoded
version number in some file somewhere at release time. But it's true
that if you're stranded somewhere you may have some free time at hand to
implement your own layer of templating ^ ^.
You miss parts of the picture :

- the previous version used %%VERSION_NUM%% in only three places, the
new one uses it more, so it broke my previous hack -- ;

- there are other things than the substitutions done by dune when
compiling the package, which do not break the build, but will break
some depending packages later on with strange and misleading errors.

My current hack creating a fake .git is in fact much more efficient and
less fragile.

As mentioned somewhere in the thread I proposed to dune upstream a
simple mechanism to bypass this git reliance issue, which will make
packaging much cleaner.

Cheers,

J.Puydt
Julien Plissonneau Duquène
2025-02-17 09:10:01 UTC
Reply
Permalink
Post by Julien Puydt
- the previous version used %%VERSION_NUM%% in only three places, the
new one uses it more, so it broke my previous hack -- ;
As a matter of best practices, this should probably be defined in a
single place and not "hardcoded" multiple times with templating.
Post by Julien Puydt
- there are other things than the substitutions done by dune when
compiling the package, which do not break the build, but will break
some depending packages later on with strange and misleading errors.
Do you mean here that using the "official" tbz source tarballs for
builds outside of a git tree will result in these errors? If so, that's
a serious upstream build tool issue IMO.
Post by Julien Puydt
As mentioned somewhere in the thread I proposed to dune upstream a
simple mechanism to bypass this git reliance issue, which will make
packaging much cleaner.
That's probably the way to go here. I would also suggest modifying
dune-release so the git release tags end up with the substitutions
already applied, to make it possible to simply export them and build
them outside of a git tree.

Cheers,
--
Julien Plissonneau Duquène
Julien Puydt
2025-02-17 21:50:01 UTC
Reply
Permalink
Hi,

Le lundi 17 février 2025 à 10:07 +0100, Julien Plissonneau Duquène a
Post by Julien Plissonneau Duquène
Post by Julien Puydt
- the previous version used %%VERSION_NUM%% in only three places, the
new one uses it more, so it broke my previous hack --  ;
As a matter of best practices, this should probably be defined in a
single place and not "hardcoded" multiple times with templating.
Yes. Most upstream have little clue on what a best practice is, and
need to be explained at length. That's part of the job of a DD...
Post by Julien Plissonneau Duquène
Post by Julien Puydt
- there are other things than the substitutions done by dune when
compiling the package, which do not break the build, but will break
some depending packages later on with strange and misleading
errors.
Do you mean here that using the "official" tbz source tarballs for
builds outside of a git tree will result in these errors? If so,
that's a serious upstream build tool issue IMO.
No, I mean the build tool takes the git repo with the .git/ directory,
and gives a tarball without .git/ where substitutions have been made
(those can be recognized because they look like %%FOO_BAR%%) *and* some
code added in various places (those can't be recognized beforehand).

Their .tbz is working nicely -- but it isn't source anymore!
Post by Julien Plissonneau Duquène
Post by Julien Puydt
As mentioned somewhere in the thread I proposed to dune upstream a
simple mechanism to bypass this git reliance issue, which will make
packaging much cleaner.
That's probably the way to go here. I would also suggest modifying
dune-release so the git release tags end up with the substitutions
already applied, to make it possible to simply export them and build
them outside of a git tree.
I think the way to go is the one I proposed in my upstream bug report 
( https://github.com/ocaml/dune/issues/11484 )
and which Stéphane Glondu proposed a PR for
( https://github.com/ocaml/dune/pull/11485 ). It hasn't been accepted
yet, but there's hope.

Cheers,

J.Puydt
Julien Plissonneau Duquène
2025-02-18 09:40:01 UTC
Reply
Permalink
Hi,
Post by Julien Puydt
Their .tbz is working nicely -- but it isn't source anymore!
I think that we disagree on this one. Here the extent of source code and
build files changes between the .tbz and the git repository is
reasonably small and can be compared to what happens in many projects
where at release time a local release branch is created, variables and
metadata are adjusted, the sources/binaries/docs distributions are
built, and only the tag is pushed to the public repository. The issues I
see in this project is that these changes are not committed to the git
repository at all, and that using external templating to set a version
number in so many different places is an antipattern. If you wanted to
manually revert these changes to get the source tree in the same state
as the upstream development branch, it would only take you a few
minutes. If this really bothers you, you could also ship a diff in your
debian source package, automate the update of said diff with a custom
script run by uscan and document all that in d/README.source.

Here I would probably use the .tbz as I would consider it the preferred
form of modification when it comes to fixing things in this version of
the package (less steps and thus less risks that a later release of some
build tool behaves differently and breaks things).

Cheers,
--
Julien Plissonneau Duquène
Holger Levsen
2025-02-18 14:00:01 UTC
Reply
Permalink
Post by Julien Puydt
Yes. Most upstream have little clue on what a best practice is, and
need to be explained at length.
TBH I basically stopped reading here, this assumption is so flawed, the
(upstream) world is way more diverse.
--
cheers,
Holger

⢀⣎⠟⠻⢶⣊⠀
⣟⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org
⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
⠈⠳⣄

I used to be scared for our grandchildren's future. Such optimism!
Julien Puydt
2025-02-16 15:40:01 UTC
Reply
Permalink
If the upstream intends to distribute it with a tarball, that's the
"golden" package that downstream should base code upon.
Going around that decision means subjecting all of Debian to code
pulled from their repo outside of their distribution process.
I don't understand: it never was about packaging a random git commit.

Upstream works on code in a preferred form, and tags a definite commit
as being version 3.14159. This gets in some tarballs already in some
cases (github comes to mind) ; call that source-level-zero.
From this upstream runs some tool (generally triggered by their
tagging) ; often some autotools, 'dune' on the package which started
this thread, but rust/js/ruby/whatever also have their own. This tool
turns the git commit tree into some pre-compiled tree, put into another
tarball. Call this source-level-one.

The point I made about the elpi package is that we want to ship source-
zero, as that's what upstream works on. This is in line with Debian
shipping autotools-based packages but re-running the autotools before
they run configure again.
How much function is lost as a result is nothing compared to the
instability of packages that can result from distributing code that
was not meant for distribution.
Well, "use what upstream publishes" sounds nice until you look at the
landscape :

- many Python upstreams use pypi and consider that their pypi package
is what they publish ;

- many JavaScript upstreams use npm and consider that their npm package
is what they publish ;

- many Rust upstream use cargo and consider that their cargo package is
what they publish ;

- many OCaml upstreams use opam and consider that their opam package is
what they publish ;

- many Coq upstreams use a sub-distribution of opam (the Coq Platform)
and consider this is what they publish ;

- etc.

There is a huge confusion in many upstream's heads that putting their
software on people's computers is what "distributing" means. The fact
that their software is only semi-coherent with the system (only within
a certain programming language boundary) is a problem.

Debian is a whole-system distribution ; we make sure software works in
a coherent system-wide way, and basing our packages on code which has
already been pre-package for a subpar distribution (language-limited)
isn't a good option.

Cheers,

J.Puydt
Sean Whitton
2025-02-17 00:30:01 UTC
Reply
Permalink
Hello,
The potential for additional function is not relevant.
If the upstream intends to distribute it with a tarball, that's the "golden"
package that downstream should base code upon.
The Debian project officially disagrees with you.

The preferred form for modification, which is what NEW cares about, is
determined by upstream's actual practices, not by their fiat.

We frequently reject packages from NEW because we have minified files;
we add the source to debian/missing-sources/.
--
Sean Whitton
Jeremy Stanley
2025-02-17 15:30:01 UTC
Reply
Permalink
Post by Sean Whitton
The potential for additional function is not relevant.
If the upstream intends to distribute it with a tarball, that's
the "golden" package that downstream should base code upon.
The Debian project officially disagrees with you.
The preferred form for modification, which is what NEW cares about, is
determined by upstream's actual practices, not by their fiat.
We frequently reject packages from NEW because we have minified files;
we add the source to debian/missing-sources/.
Unfortunately Debian is also very conflicted on this point.

Debian has, for legal and logistical reasons, decided that it cannot
distribute upstream Git repositories as its source packages, and
instead chooses to try to condense upstream's preferred form for
modification (back) into source tarballs. In some cases, this
condensing loses data that upstream considers important, even at
times things referred to in copyright licenses. The irony here is
that some of those upstreams do publish source tarballs where that
data is extracted from their Git repositories so it can be included
correctly, but package maintainers need to be careful to run the
same source tarball build process for the basis of the Debian source
package in those cases and not just pretend that the _files_ tracked
in Git are the same as upstream's preferred form for modification.
--
Jeremy Stanley
Simon Richter
2025-02-18 09:10:02 UTC
Reply
Permalink
Hi,
Post by Jeremy Stanley
Debian has, for legal and logistical reasons, decided that it cannot
distribute upstream Git repositories as its source packages, and
instead chooses to try to condense upstream's preferred form for
modification (back) into source tarballs.
If we "cannot", then it is not something we "choose".

We have several upstreams whose git repositories contain files that may
be legal for upstream to distribute, but that do not fulfill our,
stricter, requirements, or where we do not want to ship the files as
they are for technical reasons, such as when upstream is vendoring old
versions of libraries.

We have a toolchain that can handle file exclusions when importing
source archives from upstream, and these tools output a compressed
source tarball.

In theory, they could output a git bundle with the offending files
removed, but this would not be useful: simply removing the objects makes
it impossible to import the bundle, and changing the objects changes the
hashes, so neither can fulfill the role of "preferred form for
modification."

What I'd like to see at some point is "compressed bundle containing a
single commit or tag" as the orig archive. Verifying this against
upstream is a little harder than for upstream released tarballs (where I
can just check if the file in Debian is bitwise identical to the one
released and possibly signed by upstream), but that can be fixed with
appropriate tooling.

This would not work, however, for upstreams where we exclude files: for
this, we'd need to extend git to allow us to both create and import
incomplete bundles. That is a lot harder, but would be required for this
to be universally useful.

(there is also some correlation between "upstream's git does not contain
any unredistributable files or vendored dependencies" and "upstream
ships a usable source tarball", so precisely the cases where git is the
only upstream we have are the cases where that is least useful to us)

Simon
Sean Whitton
2025-02-18 08:00:01 UTC
Reply
Permalink
Hello,
"Actual practices" and "by fiat" are no different here.
The former refers to what they actuall work with, the latter refers to
what they claim is the preferred form of modification. These can come
apart.
They do as they like with their code.
Yes, just as we may :)
--
Sean Whitton
Loading...