[Rpm-maint] Rpm Database musings

Jan Zeleny jzeleny at redhat.com
Mon Mar 4 08:29:00 UTC 2013


Dne Ne 3. března 2013 17:46:10, Panu Matilainen napsal(a):
> On 03/01/2013 06:32 PM, Michael Schroeder wrote:
> > Hi Panu et al,
> > 
> > here are some numbers/musings about changing the database
> > implementation to just one single packages file:
> > 
> > - I assume that we still want to store all the headers (in some
> > 
> >    format) anyway.
> 
> Nod, I think the headers need to stay, the exact format is another, open
> question.
> 
> > - I checked all the headers of the i586/noarch packages from FC18
> > 
> >    to get some understanding how big they are and if it makes
> >    
> >    sense to compress them. Here's the result:
> >      scanned: 28423 rpms
> >      uncompressed: sum: 777290960, avg: 27348, median: 10600
> >      lzo:          sum: 305711769, avg: 10756,  median: 4805
> >      gzip:         sum: 255995670, avg:  9007,  median: 4154
> >      xz:           sum: 215564872, avg:  7585,  median: 3728
> >    
> >    (the median is quite different from the avg, that means that
> >    some packages are quite big.)
> >    
> >    As you can see, compression about halfs the size of the headers.
> >    LZO seems to be "good enough" and has the advantage that it's
> >    really fast.
> > 
> > - That means, if I have 2000 packages installed on my system
> > 
> >    (which is about the real number), the concatenated headers will
> >    use 20 MByte (using the median), 10 MByte when using LZO
> >    compression, 7.5 with xz.
> > 
> > - So if we want to drop all index files and just scan the
> > 
> >    packages database, we would need (assuming disk IO throughput
> >    of 50 M/s) about .2 seconds to create the in-memory index
> >    data. Which maybe is too much, I dunno.
> 
> Right, in this context compression does indeed seem quite attractive.
> When we talked about this in the devconf, I was thinking about the way
> rpm itself currently keeps (re)loading the headers from Packages and
> adding repeated decompression to the other costs of header loading
> didn't seem like a way to make it faster. But for roughly halving the
> amount of io needed for scanning through it exactly once (which is of
> course the way libsolv operates) its quite a different thing.

Which begs the question - can we make RPM behave this way as well? ;-)

> 0.2s is not a whole lot, for many operations absolutely nothing really,
> but I'd think some kind of cache would be in order to avoid having to
> read through all of packages just for those simple 'rpm -qf /foo' kind
> of queries. Such as, store the in-memory index structures into a memory
> mapped cache file. The cache could perhaps be write-once and read-only
> for other uses so there's no need for locking within the cache: eg
> recreate it from scratch at the end of transactions and atomically
> replace the old one so the cache itself is always coherent. Or
> something... this isn't that far from libsolv's .solv files.
> 
> Speaking of which... a funny little idea I got at the end of the
> devconf: regardless of future rpmdb format changes, it should be now
> possible to write an rpm plugin that creates + updates a .solv file for
> the rpmdb, so you should never have to actually read through the entire
> rpmdb in libsolv and its users like libzypp, dnf etc.

This sounds really cool.

Thanks
Jan


More information about the Rpm-maint mailing list