[Rpm-maint] Rpm Database musings

Panu Matilainen pmatilai at laiskiainen.org
Mon Mar 11 16:28:31 UTC 2013


On 03/11/2013 02:14 PM, Michael Schroeder wrote:
> On Fri, Mar 08, 2013 at 09:21:33PM +0200, Panu Matilainen wrote:
>> It has its advantages of course. Having headers spread in different files
>> would probably make some things easier but also slower, so you'd really
>> want to avoid having to go to the headers. I did a quick test-case in
>> python yesterday: reading through all the ~2160 headers in my rpmdb with
>> the current libdb implementation (with no signature checking) takes about
>> 0.11s, loading them from separate files takes about 0.15s. Small numbers
>> but in percentages thats quite a lot.
>
> Is that with dropped caches (echo 3 > /proc/sys/vm/drop_caches)?

Heh, no :) That was with hot caches. Which of course is not the typical 
situation unless you happen to hack package management software for a 
living...

With dropped caches it about 11.5s for the libdb implementation, circa 
15.5s for the separate files. So relative performance is the same, only 
now the numbers aren't that small anymore.

>
>>> Anyway, attached is a little Packages database implementation I did yesterday
>>> and today. The code is very careful not to destroy things if the database
>>> is corrupt, i.e. it makes sure that it does not overwrite data.
>>
>> Wow, that didn't take long. One might get the idea that you're even more
>> eager to get rid of BDB than I am :D Can't blame you for that...
>
> Well, I did it because A) it was a fun little hack and B) it's good
> to have something to verify our ideas.

Yup, its highly useful to have something concrete as a starting point. 
I've already refactored the rpmdb code a fair bit towards separating the 
backend implementation from the "rpmdb" level. Doing that has been on my 
TODO for ages and occasionally been nipping around the edges but with a 
more concrete target now, it might actually happen for real.

>> We could perhaps take some advantage of knowing the way how rpm does
>> transactions: erases always come after installs, so on upgrades there are
>> never free slots originating from the same transaction. So we could just do
>> lazy deletion: just flag the removed headers for erasure but dont actually
>> bother deleting and zeroing them, the next transaction that occurs will do
>> that. Should reduce the amount of data needing fdatasync() as well.
>
> Yes, that could work. OTOH it makes crash recovery a bit harder.
>
>> Kinda related to the above: I dont see the header timestamp being actually
>> used for anything (but then I might've missed something).
>
> I added the timestamp so that when there was a crash and we need to scan the
> database and there are multiple good headers for the same pkgid, we know which
> one to take.

Right, makes sense.

	- Panu -


More information about the Rpm-maint mailing list