[Rpm-ecosystem] Proposed zchunk file format

Jonathan Dieter jdieter at gmail.com
Sat Mar 3 12:32:26 UTC 2018


On Fri, 2018-03-02 at 12:44 +0000, Michael Schroeder wrote:
> On Fri, Mar 02, 2018 at 02:33:09PM +0200, Jonathan Dieter wrote:
> > No, I didn't expect it to have much effect.  Since openSUSE's xml
> > file
> > are (presumably) ordered so new packages come last, do you have any
> > old
> > primary.xml files lying around that I can test?
> > 
> > If not, I'll grab them from the next few updates.
> 
> They are ordered for the update channels of Leap, but Tumbleweed
> is a rolling release distro and thus not ordered. (This also means
> that delte repo downloads currently don't work that well for
> Tumbleweed,
> so I'm eager to find something better).
> 
> How about using the Fedora metadata but reorder the entries with
> the buildtime as sort key?

That works.  Here are the numbers.  They are closer, but only by a few
percentage points, which surprised me.  Zsync does beat zchunk in a few
cases, but they're all when the delta is very small (< 50k).  Any time
the delta is larger than 100k, zchunk wins by a minimum of 20%.

Interestingly, zchunk's numbers also generally got better when the
metadata was sorted by build date, but I think that's because my
current "chunk by srpm" algorithm only puts two packages with the same
srpm in the same chunk if they're next to each other.  When sorted by
build date, they're guaranteed to be next to each other, while if
sorted by name, some packages might be far away from each other (i.e.
dbus and python3-dbus won't be next to each other if sorted by name).  

zsync - sorted by build date
1->2 - 1457710
2->3 - 1051405
3->4 - 489221
4->5 - 33851
5->6 - 41331
6->7 - 1607445
7->8 - 26625
1->4 - 2206614
3->6 - 544855
6->8 - 1612897

zchunk - sorted by build date - chunked by srpm
1->2 - 1108238 - 24% smaller
2->3 - 768845 - 27% smaller
3->4 - 340866 - 30% smaller
4->5 - 36576 - 8% larger
5->6 - 41412 - < 1% larger
6->7 - 1208562 - 25% smaller
7->8 - 12083 - 55% smaller
1->4 - 1714803 - 22% smaller
3->6 - 370844 - 32% smaller
6->8 - 1214039 - 25% smaller


More information about the Rpm-ecosystem mailing list