[Rpm-maint] [PATCH] Add RPMTAG_IDENTITY calculation as tag extension
Panu Matilainen
pmatilai at redhat.com
Fri Apr 6 05:52:15 UTC 2018
On 04/05/2018 03:42 PM, Vladimir D. Seleznev wrote:
> On Thu, Apr 05, 2018 at 11:41:33AM +0300, Panu Matilainen wrote:
>> On 04/03/2018 10:31 PM, Vladimir D. Seleznev wrote:
>>> RPMTAG_IDENTITY is calculating as digest of part of package header that
>>> does not contain irrelevant to package build tag entries.
>>>
>>> Mathematically RPMTAG_IDENTITY value is a result of function of two
>>> variable: a package header and an rpm utility, thus this value can
>>> differ for same package and different version of rpm.
>>>
>>
>> Before proceeding with further work on this, we need to define what is
>> it that we're trying to identify. The above definition is very
>> ambiguous, and it's impossible to properly review + discuss the patch
>> when my idea of package identity might be entirely different from
>> somebody elses idea, that'll only cause unnecessary work and frustration.
>
> Agree, that commit message isn't clear.
>
>> Starting with, what is a "package"? Are we talking about the source
>> package, or binary packages?
>
> Originally it was about binary packages, but is there really difference?
> Source packages are building as well as binary, and something can be
> changed after rebuild.
Source *packages* are built too, yes, but there's a vast difference
between reproducability of src.rpm and binary rpm.
However while reviewing the patch yesterday, I realized I've been
increasingly thinking about *source* identity (note the lack of
"package"), which is something quite different: you'd calculate a digest
over the unparsed spec + all the sources and patches etc the spec refers
to [*] and save it in the header of binaries and sources on build. This
would let you identify all the packages that have been built from the
same source, ie whether the package was built eg on Fedora or RHEL (it's
fairly common to share specs between them) or whatever it'd have the
same source id.
[*] obviously you need to parse the spec to get those references and
it's possible to create specs where this differs between arches, but
sane specs use same sources + patches between archs etc
>
>> If it's binaries, then we're always ultimately talking about a *build*,
>> and a line needs to be drawn somewhere.
>
> OK.
>
>> There are any number of ways to draw such a line, so it needs to be
>> explicitly stated. One example of such line could be something like
>> "package id must match between a package built on different instances
>> of the same operating system, version and architecture". That clearly
>> is NOT the line that this version of the patch tries to draw, but then
>> it's not at all clear to me what that line is supposed to be.
>
> I think, there should be a line with other side idea: if package
> identity is matched between package build on the same build environment,
> then the build is reproducible.
>
> The possible new version of commit massage is below:
>
> Add RPMTAG_IDENTITY calculation as tag extension
>
> RPMTAG_IDENTITY is calculating as digest of values of significant
> package header tag entries and represents package build characteristics.
> The main purpose of package identity is reproducible build verification:
> if package identity is matched between package build on same build
> environment, then the package build is reproducible for this
> environment.
Right, reproducability is one such line and that'd be a much better
description.
I do think that RPMTAG_IDENTITY is overly broad name for such a narrow
purpose though - note how it led me to think about the source level
identity instead. Something towards "build id" maybe, but we don't want
to mix it up with debuginfo buildid. No need to get hung over it right
now though, just something to think about.
- Panu -
More information about the Rpm-maint
mailing list