[Rpm-maint] [rpm-software-management/rpm] Examine Compressed Headers (Issue #2220)

Daniel Alley notifications at github.com
Wed Dec 21 06:33:20 UTC 2022


Here's a script I threw together in 30 minutes for getting a rough estimate of the usefulness

```Python
#!/usr/bin/env python

import os
import sys
import createrepo_c as cr

import lz4.frame
import zstd

results = {}

with os.scandir(sys.argv[1]) as entries:
    for entry in entries:
        if not entry.is_file() or not entry.path.endswith(".rpm"):
            continue

        pkg = cr.package_from_rpm(
            entry.path,
            location_href=str(entry.path),
            checksum_type=cr.SHA256,
        )
        header_size_bytes = pkg.rpm_header_end - pkg.rpm_header_start
        with open(entry.path, "rb") as f:
            f.seek(pkg.rpm_header_start)
            header = f.read(header_size_bytes)

        zstd_header = zstd.ZSTD_compress(header)
        lz4_header = lz4.frame.compress(header)

        results[str(entry.path)] = {
            "header_size": header_size_bytes,
            "package_size": pkg.size_package,
            "archive_size": pkg.size_archive,
            "header_size_zstd": len(zstd_header),
            "header_size_lz4": len(lz4_header),
        }

total_size_headers = 0
total_size_packages = 0
total_size_archives = 0
total_size_lz4 = 0
total_size_zstd = 0

print("Results for {} packages".format(len(results)))

for package, data in results.items():
    total_size_headers += data["header_size"]
    total_size_packages += data["package_size"]
    total_size_archives += data["archive_size"]
    total_size_lz4 += data["header_size_lz4"]
    total_size_zstd += data["header_size_zstd"]

print("Average header size as proportion of package total: {:.2f}%".format(total_size_headers / total_size_packages * 100))
print("Average header savings for LZ4 compressed headers: {:.2f}%".format(total_size_lz4 / total_size_packages * 100))
print("Average header savings for ZSTD compressed headers: {:.2f}%".format(total_size_zstd / total_size_packages * 100))

```

Run it like so 

```
[dalley at thinkpad devel]$ python compressed_header_test.py ~/devel/repos/fixture/
Results for 35 packages
Average header size as proportion of package total: 64.90%
Average header savings for LZ4 compressed headers: 46.56%
Average header savings for ZSTD compressed headers: 33.52%
```

(these sample results ought to be ignored entirely, the packages are effectively completely empty hello-world type stuff, not even remotely real-world)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/issues/2220#issuecomment-1360910971
You are receiving this because you are subscribed to this thread.

Message ID: <rpm-software-management/rpm/issues/2220/1360910971 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20221220/76e2bd2d/attachment-0001.html>


More information about the Rpm-maint mailing list