[Rpm-maint] Problems in rpmstrPoolRehash function

Tue Oct 8 09:15:01 UTC 2013

On 09/11/2013 08:09 PM, Michael Schroeder wrote:
>
> Hi rpm Maintainers,

Hi,

Sorry for the late reply. Trying to finally get back in business, moved 
to a new house and life's been "slightly" chaotic for the past weeks. I 
was expecting Florian to comment on this as the pool chunks and dummy 
entries were added by him :)

>
> I think I've found two bugs in the rpmstrPoolRehash() function:
>
>   1) IMHO there's an off-by-one in the for loop: pool->offs_size is
>      the last used id, thus it should be "<=" instead of "<".

Yeah, I think you're right. rpmstrPoolRehash() is not much used in 
practise but a bug is a bug...

>
>   2) the function should to skip the "dummy" entries that are put
>      at the end of each chunk.
>
> A chunk looks like this:
>
> foo\0bar\0...\0baz\0
> ^    ^    ^    ^    ^
>
> The dummy entry is there to make rpmstrPoolStrlen() work.
> Putting it in the hash is wrong. I've changed the code so that:

Ugh, yes the dummies dont belong in the hash...

>
> - \0 is written to where the dummy entries point (not strictly
>    needed as chunks are allocated with calloc, but nevertheless
>    good style).
> - the rpmstrPoolRehash() loop checks if the string is of size
>    zero (true for dummy entries). If that's the case it checks
>    if the next string does not start after the \0, if that's
>    also true it is a dummy entry.
>
>
> There are different (and easier) ways to fix this:
>
> - you can always put an empty string into the pool, it would always
>    have id 1. This simplifies the dummy entry check to:
>        if (i != 1 && str[0] == 0)

I thought of that (I know that's what libsolv's strpool does) but the 
idea was to allow using the pool for unique string count and the like, 
including empty strings.

> - you could get rid of the dummy entries and remove the
>    rpmstrPoolStrlen() function. It's only used 5 times in the code
>    and calling strlen() on the returned string does not cost much.

strlen() can get quite expensive when done by the masses, calculating 
string lenghts is one of the biggest time-consumers in headerImport(), 
except in the newish "fast" mode where offset calculations are used. 
Avoiding strlen() style string-walk makes quite a difference there and 
IIRC rpmstrPoolStrlen() does make a difference in eg fingerprinting, but 
dont have numbers at hand.

Anyway, I tend to agree on getting rid of the dummies - just special 
case rpmstrPoolLen() for the last entry, it'll still be constant time 
for most of the uses (there aren't that many in the codebase but I'd 
expect the number to grow as time goes on)

> Hmm, I just see that the code in rpmfc.c also loops over the ids
> including the dummy entries. Oh my. rpmfcApply() should at least
> ignore "" entries.

The code just assumes strpool is not buggy :)

> And it seems to modify the string returned
> from the strpool. Oh my again. (But using strtol() to convert
> a "const char *" to a "char *" is clever.)

Oh ugh, hadn't noticed that at all. Not the first case of accidental (I 
hope!) conversion of const char * to char * in rpm either...
AFAICS it's not harmful in practise in this particular case as the 
modifications only occur after all additions have been done and the pool 
thrown away but yeah it needs fixing.

	- Panu -

>
> Cheers,
>    Michael.
>
>
>
> _______________________________________________
> Rpm-maint mailing list
> Rpm-maint at lists.rpm.org
> http://lists.rpm.org/mailman/listinfo/rpm-maint
>