[Rpm-maint] [RFC][PATCH] Large file support

James Antill james at fedoraproject.org
Thu Sep 10 15:28:03 UTC 2009

On Wed, 2009-09-09 at 16:42 +0200, Florian Festi wrote:
> Hi!
> I had a look at the 4GB per packaged file limit. The current cpio format 
> [1] uses 8 bytes to encode 32 bit integers as hexadecimal ASCII strings. 
> So there is no way of fixing this problem while staying compatible with 
> the cpio format (and keep rpm2cpio working).
> Having a look at the tar formats I do not belief that switching to tar 
> is a real option. The format is just horrible (GNU tar needs over 200 
> lines to read an integer out of a header field) and full of hacks to 
> remain backward compatible (header in header + extentions). This would 
> be all not that bad if there where a nice little library we could link 
> against...

 Some of this is just GNU tar code, but it's true that the cpio format
is much nicer. And personally I think it'd have to be an awesome win to
move to a completely new format.

> My favorite solution would be to use no payload format at all and just 
> rely on the meta data we ship in the header anyway. While this would 
> surely be possible it requires redoing the hard link handling (as hard 
> links are treated specially in the payload - like shipping the content 
> just once) and modifying the next upper layer within rpm (fsm.c) which 
> is probably the most horrible place in the whole code base. Volunteers 
> welcome!

 I assume rpm could then ship something simple that would do what
everyone uses rpm2cpio for (unpack stuff to disk)? This is a better
option than moving to tar/XAR/zip/whatever, IMO, but still seems like a
lot of work for not much gain.

> A much simpler alternative would be to use a slightly modified cpio 
> format. With a new magic for large files we could just put an binary 
> integer into the c_filesize field (or all integer fields). Another 
> solution could be to keep the hexadecimal encoding and just double the 
> c_filesize or even some more integer fields.
> This will both render the payload incompatible with cpio if there are 
> large files (and only then).
> I did not yet ask cpio upstream or our cpio package maintainer about 
> accepting patches to at least read such archives...
> Attached patch uses a binary integer for large file sizes. Patch is 
> untested and assumes that everything else that deals with file sizes 
> already is 64 bit save.
> Comments? Ideas? Panic?

 I think a "new" cpio format that allows the extension of (quick look at
cpiohdr.h) ino, uid, gid, mtime and filesize ... would probably be
accepted everywhere fairly quickly.
 Speak to the Fedora guy and upstream (although I wouldn't expect a
quick response from upstream, personally), as I'm less sure if they'd
want to just double all the values or be clever in some way to keep the
hdr compact.

James Antill <james at fedoraproject.org>

More information about the Rpm-maint mailing list