[Rpm-maint] [rpm-software-management/rpm] A specfile parser without Turing-Complete side-effects (Discussion #3906)

Florian Weimer notifications at github.com
Tue Sep 9 09:15:24 UTC 2025


> The primary motivation is auditability—our processes should reproducibly convert the input sources into binary artifacts. The ultimate source of truth for sources is dist-git (git + tarball cache). We must ensure that nothing outside of dist-git influences the resulting artifacts (abstracting away the kernel and hardware for now), and it's enough to, e.g., analyze sources that are in dist-git checkout + corresponding lookaside cache.

But treating dist-git as the source of truth is a choice. We would we make this choice?

Some distributions have outsourced dist-git (or planning to do so). TLS connections to dist-git are terminated by a party that is not even under contract with them. Authentication of Git repository contents relies on online connectivity. That makes dist-git a difficult to choice for a persistent archive mechanism.

> Then, from an auditability perspective, a Source RPM does not help. It is essentially a binary blob—an archive—generated from a given spec file and containing the spec file itself, patches, sources, plus distribution/architecture specific metadata. Importantly, any file or metadata (including the spec file itself) added to the Source RPM can be modified on the fly by Turing-complete macros. This means there is no declarative path from dist-git to a Source RPM if arbitrary code execution cannot be avoided.

But that source RPM is then used in a disconnected environment to build another source RPM, which is then used to build the actual binary packages. So the source RPM still contains evidence of what is going.

Contrast this with the dist-git service, which can simply return different Git repository contents at different times (with some effort, even for the same commit hash).

> The issue with rpmautospec (and similar tools) is that they execute Turing-complete RPM macros for every commit up to the most recent Version tag change. Fortunately, rpmautospec only depends on parsing the Epoch/Version, which we can **usually** read without triggering those Turing-complete side effects.

About 5% of Fedora packages use macros in `Version:`.

Looking at past commits and executing their contents is certainly surprising behavior. It's entirely possible to have something like rpmautospec without that property, by flipping control around: The not-so-hypothetical rpmautospec replacement needs to determine the package version from the Git commit log (excluding blob contents) and present this information to the spec file—as a macro. Then the dependency on past content goes away (instead there's a dependency on past commit messages).

> So yes, at the end of the day, we need a clear distribution policies for these two fields. The new RPM mode being requested (which I am not saying must be enforced or made the default) should at least allow the distributions (that care) to enforce such a policy.

I really don't want to see this as a default because it just pushes actual distribution development out of dist-git because it's just too cumbersome. Then you just see opaque tarball hashes changing in dist-git, effectively reducing transparency.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/3906#discussioncomment-14349415
You are receiving this because you are subscribed to this thread.

Message ID: <rpm-software-management/rpm/repo-discussions/3906/comments/14349415 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20250909/964a4817/attachment.htm>


More information about the Rpm-maint mailing list