[Rpm-maint] Rpm Database musings

Michael Schroeder mls at suse.de
Fri Mar 8 14:37:12 UTC 2013

On Thu, Mar 07, 2013 at 10:28:41PM +0200, Panu Matilainen wrote:
> Right now I'm more interested in what the overall design of this all might 
> look like. Like said, I'd like to see the cache be a "read-only media" so 
> there are zero locking needed for queries that only need data from the 
> cache. It'll undoubtedly penalize writers (ie transactions) as the entire 
> cache probably needs to be regenerated even if just one package is 
> installed/removed, but then we're not in the "millions of transactions per 
> second" database business at all, in rpm's case painless (say, without 
> having a library steal your signals and blow up in all directions if you 
> miss a single iterator free, etc) and fast reads are what really counts I 
> think.

OTOH Using a simple reader/writer flock doesn't cost much.

> One possibility for (supposedly :) hassle-free Packages replacement might 
> be just storing the headers as individual files (named by their "instance" 
> or sha1 hash or something) in a directory of their own. That'd eliminate 
> the need for complex book-keeping for free slots and stuff which is pretty 
> much required for a database-style single file, plus adding and removing 
> things should be very low-cost. Dunno about looping through them all, 
> compared to eg Packages, but then for the added cost of separate open+close 
> etc for each there are no middle-layers adding cost of their own. I'm sure 
> there are downsides too, such as making the whole thing more exposed and 
> easier for outside abuse than manipulating a BDB database file but dunno if 
> it matters in reality - root is required to mess with it, and root can 
> already rm -f /var/lib/rpm/Packages or replace it with whatever and so on.

I kind of like to have all the data in one file.

Anyway, attached is a little Packages database implementation I did yesterday
and today. The code is very careful not to destroy things if the database
is corrupt, i.e. it makes sure that it does not overwrite data.

The basic design is like this:

First there's a little header to hold the generation count needed for
the locking.

Then comes the slot space. Each slot consists of:
- a magic word
- the pkgidx of the header
- an offset
- a length
Empty slots have offset=length=pkgidx=0.

Then come the data blobs containing the headers. Each blob consists of:
- a start magic word
- the pkgidx of the header
- a timestamp when the header was writtenm into the database
- the length of the header in bytes
- the header data
- the md5sum over the data starting with the start magic word
- the length of the header in bytes again
- an end magic word

It's designed so that you can easily recover headers from a corrupt

Performance seems to be not bad, most of the time seems to be
spend in qsort and md5 calculation. The qsort part can easily be
removed, we may want to switch to something faster that the
md5sum, though, for example an adler32 checksum. Or maybe
have both adler32 and md5sum and just check the adler32 for
reading and use the md5sum for database recovery.

BUT: what's killing it is the fdatasync. Without any syncing
and cache dropping my little test program reports the following
numbers (2099 headers):

    writing took 1050 ms
    reading took 375 ms
    upgrade took 5157 ms

When I enable cache dropping I get:

    writing took 1288 ms
    reading took 3702 ms
    upgrade took 9283 ms

As you can see the reading performance suffers because all headers
have to be read from my slow disk.

LZO helps here, without LZO I get:
    writing took 1418 ms
    reading took 6119 ms
    upgrade took 13203 ms

When I enable fdatasync, it gets much worse (LZO is used):

    writing took 83794 ms
    reading took 3712 ms
    upgrade took 158699 ms

Uh oh, 83 seconds for writing? That's 40 ms for one single header,
which maybe is acceptable. It's still a bit painful, though.


Michael Schroeder                                   mls at suse.de
SUSE LINUX Products GmbH,  GF Jeff Hawn, HRB 16746 AG Nuernberg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rpmpkg.tar.bz2
Type: application/x-bzip-compressed-tar
Size: 8981 bytes
Desc: not available
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20130308/839b8337/attachment.bin>

More information about the Rpm-maint mailing list