[Rpm-maint] [rpm-software-management/rpm] API improvement to accommodate for RPM CoW (PR#1470) (Discussion #2057)
David Disseldorp
notifications at github.com
Wed Dec 7 20:35:09 UTC 2022
> The idea of aligning cpio metadata is very interesting. I can see how it'd help initramfs building speed tremendously.
>
> As I understand it, RPM is pretty different: the main difference is that we're trying (fairly hard) not to change the normal format of rpm as found on mirrors for now. There are some very interesting ideas on how to change the upstream format, but in doing so, we'd render all existing servers unable to read the format.
To clarify, aligning cpio data segments for *newly built* rpms shouldn't necessarily require any change in format. They'd continue to function the same as earlier cpio payload rpms, be it with some extra zero-padding.
> If we could tolerate the breakage: I'd love to experiment with `BTRFS_IOC_ENCODED_WRITE` which would reduce writes down and eliminate explicit decompression. For clients or filesystems without CoW support: RPM could decompress and write the normal file. I was hoping encoded writes would eliminate the complex path with curl -> librepo -> rpm2extents. I'm not sure you could get data from the network and write encoded data to disk in one pass like we're doing now. Do you have any ideas on how to resolve that challenge?
I'm not too familiar with the rpm on-disk format, but I'd hoped that `BTRFS_IOC_ENCODED_WRITE` could be used without a change to the format, by having the rpm header parsed during download to determine whether the compressed payload could be written as-is. With a cpio payload it'd then be a matter of copy_file_range()ing the (optimally aligned) compressed file data segments into the destination during installation.
`BTRFS_IOC_ENCODED_WRITE` appears very restrictive at this stage though:
- it requires `CAP_SYS_ADMIN`, so probably isn't a viable option for containers, etc.
- ioctl calls need to specify both unencoded and encoded offset+length, meaning that we'd still need to parse rpm payload compression metadata
- the ioctl unencoded length can't exceed 128 KiB
- for zstd encoded I/Os, the ioctl data must represent "as a single zstd frame with the windowLog compression parameter set to no more than 17"
- On openSUSE Tumbleweed I see some rpms currently using zstd compression level 19. IIUC, Fedora uses the same zstd level
> Adding cpio metadata, along with a "null" compression type could help eliminate the change in `fsm.c` on how the payload is iterated. Note that `rpm2extents` does not (and cannot) touch headers without invalidating signatures, so the change in compression type is inferred and handled in the plugin.
>
> Lastly, there's another optimization that would be lost in adopting cpio formatting: content de-duplication. I'm not sure how important this is tho in the big picture, so it might be a worthwhile tradeoff.
Indeed. FWIW, I think your extent based approach offers a lot of worthwhile benefits, but just wanted to point out that something similarly CoW friendly (although less efficient) is possible without necessarily requiring invasive changes :-)
> Thanks for the feedback! Matthew.
Thanks for the response!
--
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2057#discussioncomment-4337009
You are receiving this because you are subscribed to this thread.
Message ID: <rpm-software-management/rpm/repo-discussions/2057/comments/4337009 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20221207/b53ba2e2/attachment-0001.html>
More information about the Rpm-maint
mailing list