[Rpm-maint] Rpm Database musings

Panu Matilainen pmatilai at laiskiainen.org
Thu Mar 14 08:55:07 UTC 2013


On 03/13/2013 03:19 PM, Michael Schroeder wrote:
> On Fri, Mar 08, 2013 at 03:37:12PM +0100, Michael Schroeder wrote:
>> I kind of like to have all the data in one file.
>>
>> Anyway, attached is a little Packages database implementation I did yesterday
>> and today.
>
> Attached is the current version of my little experiments. The main
> changes are:
>
> - I switched to adler32 instead of md5sum
> - I added a little index database implementation, rpmidx.[ch]

Oh, awesome. I was quietly hoping you might do a proof-of-concept index 
(database) implementation too, and here we are :) Haven't looked deeply 
into it yet, but in any case with an actual alternative implementation 
it'll be much easier to work towards a backend abstraction in the rpmdb 
layer, and actually be able to test it.

>
> The index database is using mmap to map the database into memory.
> It uses the main rpmpkg database for locking.
>
> Performance and database sizes seem to be promising.
>
> Things I'm not happy about:
>
> - resizing currently works by rebuilding a new database and
>    calling rename(). I can change this to be inplace, though,
>    it just makes to code a little bit slower because I don't
>    want to simply overwrite the old data. I basically want an
>    "atomic" switch to the new data.
>
> - The generation count in idxdb is currently not used. My goal
>    is to detect crashed database updates somehow.

Yup, detecting and automatically regenerating out-of-sync indexes is 
pretty much a must (yet something we currently dont have either, sigh)

Some other "issues" in the current implementation AFAICS:
- The ability to grab all keys of an index is missing, which would be 
needed for the newish index iterator API. I always had the feeling that 
API might come back to bite us at some point...
- Index keys are limited to strings whereas we currently have others 
too, but then all the actually interesting indexes have string keys, and 
we might well be able just to eliminate the others (or convert the data 
into strings)

BTW shouldn't those h2be() and be2h() calls be htonl() and ntohl() 
instead? The idea seems to be keeping the database and indexes in 
big-endian, ie network byte order (which is good IMO), but currently its 
unconditionally byteswapping so big-endian system would have the db's in 
little endian format and little endian systems in big endian. Or am I 
totally missing something here?

	- Panu -


More information about the Rpm-maint mailing list