[Rpm-maint] [rpm-software-management/rpm] RPM with Copy on Write (#1470)
malmond77
notifications at github.com
Tue Dec 29 20:54:14 UTC 2020
This is part of https://fedoraproject.org/wiki/Changes/RPMCoW
The majority of changes are in two new programs:
# rpm2extents
Modelled as a 'stream processor'. It reads a regular .rpm file on stdin,
and produces a modified .rpm file on stdout. The lead, signature and
headers are preserved 1:1 to allow all the normal metadata inspection,
signature verification to work as expected. Only the 'payload' is
modified.
The primary motivation for this tool is to re-organize the payload as a
sequence of raw file extents (hence the name). The files are organized
by their digest identity instead of path/filename. If any digest is
repeated, then the file is skipped/de-duped. Only regular files are
represented. All other entries like directories, symlinks, devices are
fully described in the headers and are omitted.
The files are padded so they start on `sysconf(_SC_PAGESIZE)` boundries
to permit 'reflink' syscalls to work in the `reflink` plugin.
At the end of the file is a footer with 3 sections:
1. List of calculated digests of the input stream. This is used in
`librepo` because the file *written* is a derivative, and not the
same as the repo metadata describes. `rpm2extents` takes one or more
positional arguments that described which digest algorithms are
desired. This is often just `SHA256`. This program is only measuring
and recording the digest - it does not express an opinion on whether
the file is correct. Due to the API on most compression libraries
directly reading the source file, the whole file digest is measured
using a subprocess and pipes. I don't love it, but it works.
2. Sorted List of file content digests + offset pairs. This is used in
the plugin with a trivial binary search to locate the start of file
content. The size is not needed because it's part of normal headers.
3. (offset of 1., offset of 2., 8 byte MAGIC value) triple
# reflink plugin
Looks for the 8 byte magic value at the end of the rpm file. If present
it alters the `RPMTAG_PAYLOADFORMAT` in memory to `clon`, and reads in
the digest-> offset table.
`rpmPackageFilesInstall()` in `fsm.c` is
modified to alter the enumeration strategy from
`rpmfiNewArchiveReader()` to `rpmfilesIter()` if not `cpio`. This is
needed because there is no cpio to enumerate. In the same function, if
`rpmpluginsCallFsmFilePre()` returns `RPMRC_PLUGIN_CONTENTS` then
`fsmMkfile()` is skipped as it is assumed the plugin did the work.
The majority of the work is in `reflink_fsm_file_pre()` - the per file
hook for RPM plugins. If the file enumerated in
`rpmPackageFilesInstall()` is a regular file, this function will look up
the offset in the digest->offset table and will try to reflink it, then
fall back to a regular copy. If reflinking does work: we will have
reflinked a whole number of pages, so we truncate the file to the
expected size. Therefore installing most files does involve two writes:
the reflink of the full size, then a fork/copy on write for the last
page worth.
If the file passed to `reflink_fsm_file_pre()` is anything other than a
regular file, it return `RPMRC_OK` so the normal mechanics of
`rpmPackageFilesInstall()` are used. That handles directories, symlinks
and other non file types.
# New API for internal use
1. `rpmReadPackageRaw()` is used within `rpm2extents` to read all the
headers without trying to validate signatures. This eliminates the
runtime dependency on rpmdb.
2. `rpmteFd()` exposes the Fd behind the rpmte, so plugins can interact
with the rpm itself.
3. `RPMRC_PLUGIN_CONTENTS` in `rpmRC_e` for use in
`rpmpluginsCallFsmFilePre()` specifically.
4. `pgpStringVal()` is used to help parse the command line in
`rpm2extents` - the positional arguments are strings, and this
converts the values back to the values in the table.
Nothing has been removed, and none of the changes are intended to be
used externally, so I don't think a soname bump is warranted here.
You can view, comment on, or merge this pull request online at:
https://github.com/rpm-software-management/rpm/pull/1470
-- Commit Summary --
* RPM with Copy on Write
-- File Changes --
M Makefile.am (6)
M lib/depends.c (2)
M lib/fsm.c (50)
M lib/package.c (40)
M lib/rpmlib.h (9)
M lib/rpmplugins.c (21)
M lib/rpmte.c (5)
M lib/rpmte.h (2)
M lib/rpmtypes.h (3)
M macros.in (1)
M plugins/Makefile.am (4)
A plugins/reflink.c (340)
A rpm2extents.c (519)
M rpmio/rpmpgp.c (10)
M rpmio/rpmpgp.h (9)
-- Patch Links --
https://github.com/rpm-software-management/rpm/pull/1470.patch
https://github.com/rpm-software-management/rpm/pull/1470.diff
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/pull/1470
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20201229/d5ae0af8/attachment.html>
More information about the Rpm-maint
mailing list