[Rpm-maint] [PATCH 1/5] Optimize argvCount function

Panu Matilainen pmatilai at laiskiainen.org
Sat Feb 9 12:18:21 UTC 2013


On 02/09/2013 07:00 AM, Alexey Tourbin wrote:
> On Fri, Feb 08, 2013 at 04:24:28PM +0200, Panu Matilainen wrote:
>> On 02/05/2013 06:31 AM, Alexey Tourbin wrote:
>>> Callgrind annotations for 'rpmspec -P anaconda.spec', previous commit:
>>> 380,151,860  PROGRAM TOTALS
>>> 232,943,797  glibc-2.16-75f0d304/string/../sysdeps/x86_64/strlen.S:__GI_strlen
>>>   63,770,642  rpmio/argv.c:argvCount
>>>   13,702,038  build/parseSpec.c:readLine
>>> ...
>>>
>>> Callgrind annotations for 'rpmspec -P anaconda.spec', this commit:
>>> 364,226,458  PROGRAM TOTALS
>>> 232,943,797  glibc-2.16-75f0d304/string/../sysdeps/x86_64/strlen.S:__GI_strlen
>>>   47,845,240  rpmio/argv.c:argvCount
>>>   13,702,038  build/parseSpec.c:readLine
>>> ...
>>> ---
>>>   rpmio/argv.c | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/rpmio/argv.c b/rpmio/argv.c
>>> index f061f03..d68422b 100644
>>> --- a/rpmio/argv.c
>>> +++ b/rpmio/argv.c
>>> @@ -69,11 +69,11 @@ ARGint_t argiData(ARGI_const_t argi)
>>>
>>>   int argvCount(ARGV_const_t argv)
>>>   {
>>> -    int argc = 0;
>>> +    ARGV_const_t argv_start = argv;
>>>       if (argv)
>>> -    while (argv[argc] != NULL)
>>> -	argc++;
>>> -    return argc;
>>> +    while (*argv != NULL)
>>> +	argv++;
>>> +    return argv - argv_start;
>>>   }
>>>
>>>   ARGV_t argvData(ARGV_t argv)
>>>
>>
>> No strong objections as such as this doesn't make it any more
>> complicated than it was, but this falls into category of
>> micro-optimizations which I'm not too fond of either:
>
> It is not exactly a micro-optimization, since the argvCount(), as shown
> for the case of 'rpmspec -P anaconda.spec', was the second most
> expensive routine (and after argvJoin optimizations, is currently the
> first most expensive routine).  Optimizing the most expensive routine by
> 30%, in my opinion, does not fall into the category of micro-optimization.
>
> Of course, the real reason for this lies behind the fact that argvCount
> gets called in a manner that amounts to O(n^2) complexity.  That is,
> adding another argument with argvAdd() requires first to count
> arguments with argvCount().  To add n-th argument, (n-1) previous
> arguments are counted again, which forms an arithmetic progression.
> (It can be helpful to visualize arithmetic progression as half of the
> square below its diagonal.)

Yes, and that's why it is a micro-optimization. There isn't much point 
in trying to shave out a couple of cycles from an inner loop of a 
fundamentally stupid algorithm.

I'd much rather see the time and effort go into fixing the fundamental 
issue instead of tweaks like this that are barely, if at all, noticeable 
in wall-clock time. The reason I'm a bit wary of purely callgrind based 
optimizations is that on more than one occasion, I've seen a 
callgrind-suggested improvement of similar degree to this perform 
measurably and consistently *slower* in wall-clock time. That's not the 
case here - like said no harm done with this patch, but the real-world 
gain from the impressive looking 30% improvement in callgrind-numbers is 
only *just* visible in wall-clock.

>> The reason ARGV_t is used so widely is not because its a good data
>> structure (for obvious reasons it is anything but) but because it
>> has a convenient API that allowed replacing hand-coded variants of
>> the same things all over the place. The idea always was to replace
>> it with something saner where and when the need arises.
>> Unfortunately its not an opaque data structure so it can't be just
>> fixed behind the scenes, but still converting users of an API to
>> another similar one is fairly close to an automated
>> search-and-replace operation.
>
> ARGV_t actually CAN be fixed.

Not without adding further data structure behind it, which was my point.

> The idea is that internal accounting
> can be placed in memory BEFORE the user visible array.  Something like
> this:
>
> struct ARGV_header {
> 	size_t n;	/* vector fill */
> 	size_t m;	/* vector allocated slots */
> 	void *arena;	/* memory manager, to avoid strdup per arg */
> 	char *argv[];	/* user-visible vector */
> }
>
> size_t argvCount(ARGV_t argv)
> {
> 	struct ARGV_header *h = (struct ARGV_header) argv - 1;
> 	return h->n;
> }

Eek. There's absolutely no need to introduce such obscure stunts to get 
around it. Just add a new opaque data structure (ARGS_t for ARGI_t 
analogue or whatever) which has the same basic API as ARGV_t + necessary 
accessor function(s) for the actual data. It can also be easily made to 
share parts of the code with ARGV_t with use of internal helpers. And 
then convert relevant ARGV_t users to the new data structure.

>
> The only requirement this implementation imposes is that you should not
> try to create ARGV_t on your own - in other words, it does not accept
> "foreign argv"s.  This requirement is typically satisfied with the
>
> 	ARGV_t argv = NULL;
>
> declaration.
>
> Perhaps I should indeed try to come up with a working implementation.
> But that's another story.  At this point, I only try to make the
> argvCount() loop as tight as possible, so that quadratic behaviour
> does not strike so badly (i.e. so what I have to wait fewer seconds
> when testing other changes over a large set of specfiles).

Understood, but again: time and effort are far better spent addressing 
the scalability issue itself by using a better data structure for the 
job. If argvCount() and friends are dominating running-time then no 
amount of loop-tweaking will make the thing (its used for) *fast*, it 
will only be marginally less slower.

	- Panu -


More information about the Rpm-maint mailing list