[Rpm-maint] [PATCH 2/5] Reimplemented argvJoin function

Alexey Tourbin alexey.tourbin at gmail.com
Sat Feb 9 06:02:12 UTC 2013

On Fri, Feb 08, 2013 at 04:45:38PM +0200, Panu Matilainen wrote:
> On 02/05/2013 06:31 AM, Alexey Tourbin wrote:
> >Joining lines with consecutive rstrscat() calls was bad idea, because,
> >to append n-th argument, rstrscat() first has to calculate the length
> >of its previous result, that is, in a sense, the length of all its (n-1)
> >arguments.  Thus the complexity was O(n^2), due to arithmetic progression.
> Yes, very well known at the time of implementing it :) Here too, the
> idea always was to just stuff in something that works and optimize
> later if it actually turns out to be an issue.
> >This change introduces a highly optimized version of argvJoin.  In
> >particular, strlen() call for each argv element is issued only once.
> >Also, the inner loop is tuned for the special case of empty or
> >single-character separator, to avoid extra memcpy per-element call.
> I'd actually much rather see a simpler variant which just does away
> with the O(n^2) behavior in the most straightforward manner
> possible. This isn't IMO anywhere near critical enough function to
> warrant extra complexity just to shave out a few cycles on special
> cases.

The simplest thing to do would be, in pseudocode:

	for (i = 0; i < argvCount; i++)
	   if (i > 0)
		append separator
	   append argv[i]

But of course, this check for (i > 0) can be factored without
complicating the code:

	initialize with arv[0]
	for (i = 1; i < argvCount; i++)
	    append separator
	    append argv[i]

This is actually the basic structure that I use.  The real code, as
opposed to pseudocode, gets slightly complicated by the fact the size of
the "append" buffer must be known in advance.  Hence, the first loop is
required just to sum strlen(argv[i]).  The first optimization which I
implement is to save string lengths, so that later "append argv[i]" can
be implemented with memcpy(3).  The second optimization is that I
actually try to tighten the loop like this:

	switch (length separator) {
	case 0:
	    for (i = 1 .. argvCount)
	        append argv[i]
	case 1:
	    for (i = 1 .. argvCount)
	        append 1st char of spearator
	        append argv[i]
	    for (i = 1 .. argvCount)
	        append separator
	        append argv[i]

I argue that this is not just "a few cycles on special cases" -
rather, this is the efficiency of ARGV_t in general.  For example,
in parseChangelog.c:addChangelog() routine, argvJoin(sb, "") is called
to join all changelog lines.  Many packages have big changelogs: for
example, anaconda.spec triggers argvJoin to join 5645 changelog lines.
Thus, in this case, the optimization saves 5645 double strlen() calls
and 5645 dull memcpy() calls (with separator length equal to zero).
This adds up to quite some milliseconds.

(I also considered other optimization which would indeed make the code
hairy, and turned them down exactly for this reason.  For example, it is
possible to avoid the initial argvCount call by simply starting the loop
and saving lengths on the stack; if the stack array is not enough, jump
into another loop which reallocates the lengths.)

More information about the Rpm-maint mailing list