[Rpm-maint] [PATCH] Improve macro table performance

Thu Feb 14 18:38:17 UTC 2013

On 02/13/2013 08:34 PM, Alexey Tourbin wrote:
> On Wed, Feb 13, 2013 at 12:59:02PM +0200, Panu Matilainen wrote:
>>> There's an assumption that the table must be sorted, because
>>> rpmDumpMacroTable (used by 'rpm --showrc') prints entries in sorted
>>> order. It is not immediately obvious whether this is actually
>>> important, and hence whether the assumption can be dropped.
>>
>> --showrc is just a diagnostic tool and its output order is mere
>> cosmetics. If somebody cares enough, one could always sort the hash
>> table keys for output, performance is irrelevant for that operation.
>>
>>> Meanwhile,
>>> I argue that the change is worth applying anyway, because it removes
>>> qsort(3) stuff and makes use the underlying data structure in a more
>>> concise and explicit manner - such that it will be easier to try
>>> another data structure.
>>
>> I haven't seriously looked but its entirely possible this would
>> indeed make it easier to switch to a hash-table (which to me is
>> clearly the most "natural" data structure for macros). In the
>> meanwhile, this *removes* a fair amount of code and is much faster,
>> what's not to like?
>
> I have another reason for going with a sorted table: introducing bsearch
> macro (see below).
>
>> Well, a couple of things:
>>
>>> Also, macro table entries are now allocated in a single chunk.
>>
>> Not so much the change itself but the fact unrelated changes are
>> combined into one commit. If you find yourself typing "also", "in
>> addition" or the like into commit messages, most of the time its a
>> sign that the change should go into a separate commit :)
>
> Generally, yes.  In another commit, I found myself grouping a few
> changes together only to come up with a simpler commit subject, like
> "Optimize readLine routine".  This includes a few loosely coupled
> optimizations to readLine and related static routines; these
> optimizations can indeed be separated, and probably should be.
> But sometimes there is a trade-off between separating independent
> changes and grouping related changes into a single logical commit.

In my personal experience, every time I let myself or somebody else be 
lazy with splitting changes, it comes back to bite me sooner or later :) 
Of course there are exceptions to everything, and sometimes its simply 
not possible to sanely split a large change.

>> Also the change as its done here obscures macro entry data structure
>> somewhat:
>>
>>      char *name;         /*!< Macro name. */
>>      char *opts;         /*!< Macro parameters (a la getopt) */
>>      int used;           /*!< No. of expansions. */
>>      int level;          /*!< Scoping level. */
>>      char body[];        /*!< Macro body. */
>>
>> "name" and "opts" look like they are individual pointers that should
>> be freed separately while they're actually pointers to within
>> body[]. If you want to go this way, use something like arena[] for
>> the storage area and make name, opts and body separate "const char
>> *" pointers into the arena so its clear one shouldn't go freeing
>> them. Readability is more important than optimizing away one pointer
>> per macro entry.
>
> I argue that this data structure makes sense, and is probably optimal.
>
> - Each macro has its own body, which is almost always non-empty (macros
>    with logically empty body are usually defined via %nil).  This rules
>    out the possibility for sharing a body with another macro, and this
>    makes useless any further optimization for null/empty body.  Thus a
>    pointer is not needed for body, and it makes sense to allocate the
>    body on behalf of flexible array member, at a known/fixed offset.
>
> - Macro name can be shared: when pushed for the first time, the name is
>    allocated on behalf of flexible array, after the body.  Otherwise, the
>    name is shared with the "underlying" macro.  Hence the pointer is
>    inevitable.
>
> - Options are even more tricky: simple macros have opts set to NULL;
>    macros with arguments like foo() have opts point to a statically
>    allocated string ""; finally, macros like foo(o:) have opts point into the
>    flexible array member, somewhere after the body.
>
> Note that sharing macro name existed before, which indicates an
> intention and a half-hearted attempt to cut down on malloc.  I bring
> this attempt to the logical conclusion: no further optimizations are
> possible (in this respect).
>
> As for arena[], only body[] is tightly coupled with the macro; name and
> opts have a fair chance to point somewhere else, and it's none of
> somebody's business where they actually point to.  This is how the
> arrangement can be understood, and it makes sense then to leave the
> body[] as a flexible array member.
>
> Hence I don't buy the argument that it is not clear whether the pointers
> should be freed.  There have already been some complications with
> freeing a shared name, and these complications are now gone: a macro
> entry is freed with a single free(me) call (and there's a comment which
> ensures that "me" actually comes in a single chunk).  Furthermore, the
> only place where a macro needs to be freed is in delMacro(), and the
> only place where the macro is allocated is addMacro().  All allocations
> and all data structure manipulations are now local to these two
> complementary routines.  In the rest of the code, the macros and the
> table are used essentially in read-only mode.

Yes I see what it does, after staring it for a bit. My initial reaction 
however was:
- Mm, isn't this leaking memory from opts/name?
- No... but then where are they stored?
- Pointer into body? Wtf has the body got to do with this?

But perhaps you misunderstood what I meant: I'm not complaining about 
the way things are stored in one malloc blob, this is just a naming / 
clarity thing. On top of your patch:

--- a/rpmio/macro.c
+++ b/rpmio/macro.c
@@ -37,11 +37,12 @@ extern int optind;
  /*! The structure used to store a macro. */
  struct rpmMacroEntry_s {
      struct rpmMacroEntry_s *prev;/*!< Macro entry stack. */
-    char *name;        /*!< Macro name. */
-    char *opts;        /*!< Macro parameters (a la getopt) */
+    const char *name;          /*!< Macro name. */
+    const char *opts;          /*!< Macro parameters (a la getopt). */
+    const char *body;  /*!< Macro body. */
      int used;           /*!< No. of expansions. */
      int level;          /*!< Scoping level. */
-    char body[];       /*!< Macro body. */
+    char arena[];      /*!< String arena. */
  };

  /*! The structure used to store the set of macros in a context. */
@@ -1339,7 +1340,7 @@ addMacro(rpmMacroContext mc,
         /* entry with shared name */
         me = xmalloc(mesize);
         /* copy body */
-       p = me->body;
+       me->body = p = me->arena;
         if (blen)
             memcpy(p, b, blen + 1);
         else
@@ -1363,7 +1364,7 @@ addMacro(rpmMacroContext mc,
         size_t nlen = strlen(n);
         me = xmalloc(mesize + nlen + 1);
         /* copy body */
-       p = me->body;
+       me->body = p = me->arena;
         if (blen)
             memcpy(p, b, blen + 1);
         else

That way name, body and opts become self-explanatory by just looking at 
the struct declaration: they're const so they're pointers to "somewhere 
else, arena probably" and must not be freed or otherwise messed with, 
and the compiler will complain if you try to do so.

>> So, please split it into two parts:
>> - one to eliminate qsort() + bsearch()
>> - one to change the macro entry allocation (and see above)
>
> I'll recheck if it is possible to separate the data structure and the
> allocation logic, but it seems that it is not, since the data structure
> and the allocation are now tightly coupled.  The skeleton of addMacro()
> now looks like this:
>
> 	findEntry
> 	if (found)
> 	    allocate entry with shared name
> 	else
> 	    prepare slot
> 	    allocate entry with flexible name
> 	allocate/initialize the rest
> 	insert into the slot
>
> If it is not possible to produce two patches that commute (i.e. can be
> applied in any order), it is probably best to go with a single patch.

On the surface it seems like it shouldn't be particularly hard to split, 
but I didn't look at it that carefully and might well be missing 
something. It's not the end of the world or worth putting a huge effort 
into, but if its *reasonably* splittable, please do.

>> Also one cosmetic nit: in addMacro(), this "else" belongs to the
>> line above with the closing brace:
>> +    else {
>> +       /* extend macro table */
>> +       const int delta = 256;
>
> I usually prefer "uncuddled elses" (this comes from Perl, and the term
> comes from perlstyle manpage).  However, I'm willing to adjust to the
> received style.
>
> Now to the bsearch macro.  I've been having some second thoughts about
> the patch: is it okay to plug custom bsearch code like this?  Bsearch
> logic is pretty straightforward, but that's still quite a few lines,
> and there are other bsearch calls throughout the code.
>
> Do we even save much by inlining bsearch?  On the one hand, since
> bsearch iterates only log(n) times, which is roughly 10 (for 1024
> elements), there might seem to be just too little stuff to cut down on.
> On the other hand, it's been estimated that fully inlined bsearch/qsort
> with very simple comparison (i.e. array of integers) is 4 times faster;
> it can be 2 times faster on the array of short strings.  So, on the
> other hand, if you search routine is 2 times slower than it can be,
> it's a great deal.
>
> So I end up with trying to tailor a (reusable) BSEARCH macro, and there
> are quite some details to get it right.  It will eliminate findEntry()
> function, since findEntry is completely reduced to bsearch, and gcc
> already creates a few copies of findEntry due to constant argument
> propagation.

I wouldn't worry about the added lines too much, as despite the custom 
bsearch() this patch actually reduces the lines of code, and is faster. 
And since it might be replaced with a hash table in not-too-distant 
future...

	- Panu -