[Rpm-maint] [PATCH] Improve macro table performance

Panu Matilainen pmatilai at laiskiainen.org
Wed Feb 13 10:59:02 UTC 2013


On 01/29/2013 11:05 PM, Alexey Tourbin wrote:
> On Mon, Jan 28, 2013 at 4:10 PM, Florian Festi <ffesti at redhat.com> wrote:
>> On 01/28/2013 04:05 AM, Alexey Tourbin wrote:
>>> In the existing implementation, when a new macro is added, the whole
>>> table has to be sorted again.  Hence the cost of adding n macros is
>>> worse than O(n^2), due to arithmetic progression.
>>>
>>> This change drops all qsort(3) stuff altogether, by carefully preserving
>>> table in sorted order.  In findEntry routine, bsearch(3) is replaced
>>> with customized binary search which tracks position for insertion.
>>> In the addMacro routine, if a matching entry is not found, this
>>> position is used for direct insertion, after the rest of the elements
>>> are "shifted to the right" with memmove(3).  Likewise, in delMacro
>>> routine, the elements are shifted back to the left when the last macro
>>> definition is popped.  Technically, shifting half of the array with
>>> memmove(3) is still O(n^2); however, modern CPUs process contiguous
>>> memory in a very efficient manner, and glibc provides a fine-tuned
>>> memmove(3) implementation.
>>>
>>> Also, macro table entries are now allocated in a single chunk.
>>>
>>> This change reduces rpm startup costs by factor of 6 (see callgrind
>>> annotations below).  Also, this change improves specfile parser
>>> performance by a factor of 2 (e.g. the parse time of texlive.spec
>>> is reduced from 67s to 35s).
>>
>> Impressive numbers. I wonder if thing could even improve further by
>> switching to a hash table. RPM has it's own generic hash data type found
>> in lib/rpmhash.H and lib/rpmhash.C
>
> There's an assumption that the table must be sorted, because
> rpmDumpMacroTable (used by 'rpm --showrc') prints entries in sorted
> order. It is not immediately obvious whether this is actually
> important, and hence whether the assumption can be dropped.

--showrc is just a diagnostic tool and its output order is mere 
cosmetics. If somebody cares enough, one could always sort the hash 
table keys for output, performance is irrelevant for that operation.

> Meanwhile,
> I argue that the change is worth applying anyway, because it removes
> qsort(3) stuff and makes use the underlying data structure in a more
> concise and explicit manner - such that it will be easier to try
> another data structure.

I haven't seriously looked but its entirely possible this would indeed 
make it easier to switch to a hash-table (which to me is clearly the 
most "natural" data structure for macros). In the meanwhile, this 
*removes* a fair amount of code and is much faster, what's not to like?

Well, a couple of things:

 > Also, macro table entries are now allocated in a single chunk.

Not so much the change itself but the fact unrelated changes are 
combined into one commit. If you find yourself typing "also", "in 
addition" or the like into commit messages, most of the time its a sign 
that the change should go into a separate commit :)

Also the change as its done here obscures macro entry data structure 
somewhat:

     char *name;         /*!< Macro name. */
     char *opts;         /*!< Macro parameters (a la getopt) */
     int used;           /*!< No. of expansions. */
     int level;          /*!< Scoping level. */
     char body[];        /*!< Macro body. */

"name" and "opts" look like they are individual pointers that should be 
freed separately while they're actually pointers to within body[]. If 
you want to go this way, use something like arena[] for the storage area 
and make name, opts and body separate "const char *" pointers into the 
arena so its clear one shouldn't go freeing them. Readability is more 
important than optimizing away one pointer per macro entry.

So, please split it into two parts:
- one to eliminate qsort() + bsearch()
- one to change the macro entry allocation (and see above)

Also one cosmetic nit: in addMacro(), this "else" belongs to the line 
above with the closing brace:
+    else {
+       /* extend macro table */
+       const int delta = 256;

	- Panu -


More information about the Rpm-maint mailing list