[Rpm-maint] [PATCH 2/5] Reimplemented argvJoin function
Panu Matilainen
pmatilai at laiskiainen.org
Sat Feb 9 10:36:11 UTC 2013
On 02/09/2013 08:02 AM, Alexey Tourbin wrote:
> On Fri, Feb 08, 2013 at 04:45:38PM +0200, Panu Matilainen wrote:
>> On 02/05/2013 06:31 AM, Alexey Tourbin wrote:
>>> Joining lines with consecutive rstrscat() calls was bad idea, because,
>>> to append n-th argument, rstrscat() first has to calculate the length
>>> of its previous result, that is, in a sense, the length of all its (n-1)
>>> arguments. Thus the complexity was O(n^2), due to arithmetic progression.
>>
>> Yes, very well known at the time of implementing it :) Here too, the
>> idea always was to just stuff in something that works and optimize
>> later if it actually turns out to be an issue.
>>
>>> This change introduces a highly optimized version of argvJoin. In
>>> particular, strlen() call for each argv element is issued only once.
>>> Also, the inner loop is tuned for the special case of empty or
>>> single-character separator, to avoid extra memcpy per-element call.
>>
>> I'd actually much rather see a simpler variant which just does away
>> with the O(n^2) behavior in the most straightforward manner
>> possible. This isn't IMO anywhere near critical enough function to
>> warrant extra complexity just to shave out a few cycles on special
>> cases.
>
> The simplest thing to do would be, in pseudocode:
>
> for (i = 0; i < argvCount; i++)
> if (i > 0)
> append separator
> append argv[i]
>
> But of course, this check for (i > 0) can be factored without
> complicating the code:
>
> initialize with arv[0]
> for (i = 1; i < argvCount; i++)
> append separator
> append argv[i]
>
> This is actually the basic structure that I use. The real code, as
> opposed to pseudocode, gets slightly complicated by the fact the size of
> the "append" buffer must be known in advance. Hence, the first loop is
> required just to sum strlen(argv[i]). The first optimization which I
> implement is to save string lengths, so that later "append argv[i]" can
> be implemented with memcpy(3). The second optimization is that I
> actually try to tighten the loop like this:
>
> switch (length separator) {
> case 0:
> for (i = 1 .. argvCount)
> append argv[i]
> break;
> case 1:
> for (i = 1 .. argvCount)
> append 1st char of spearator
> append argv[i]
> break;
> default:
> for (i = 1 .. argvCount)
> append separator
> append argv[i]
> break;
> }
>
> I argue that this is not just "a few cycles on special cases" -
> rather, this is the efficiency of ARGV_t in general. For example,
> in parseChangelog.c:addChangelog() routine, argvJoin(sb, "") is called
> to join all changelog lines. Many packages have big changelogs: for
> example, anaconda.spec triggers argvJoin to join 5645 changelog lines.
> Thus, in this case, the optimization saves 5645 double strlen() calls
> and 5645 dull memcpy() calls (with separator length equal to zero).
> This adds up to quite some milliseconds.
Obviously the sizes need to be precalculated for any non-stupid
implementation. The kind of simpler I'm talking about would be (seems I
ended up actually implementing it too...):
char *argvJoin(ARGV_const_t argv, const char *sep)
{
int argc = 0;
size_t argvlen = 0;
size_t seplen = 0;
char *p, *dest = NULL;
/* calculate args and their length */
for (const char **arg = argv; arg && *arg; arg++) {
argvlen += strlen(*arg);
argc++;
}
if (argc < 1)
return NULL;
/* allocate destination buffer */
if (sep)
seplen = strlen(sep);
dest = xmalloc(argvlen + seplen * (argc - 1) + 1);
/* first element */
p = stpcpy(dest, argv[0]);
/* remaining elements, with separators */
for (int i = 1; i < argc; i++) {
if (seplen)
p = stpcpy(p, sep);
p = stpcpy(p, argv[i]);
}
*p = '\0';
return dest;
}
...which runs just as fast in practise - stpcpy() is slower than
memcpy() for copying but then it avoids having to do an extra
argvCount() and potential malloc() for storing the individual string sizes.
- Panu -
More information about the Rpm-maint
mailing list