[Rpm-maint] [PATCH] Improve macro table performance
alexey.tourbin at gmail.com
Tue Jan 29 21:05:48 UTC 2013
On Mon, Jan 28, 2013 at 4:10 PM, Florian Festi <ffesti at redhat.com> wrote:
> On 01/28/2013 04:05 AM, Alexey Tourbin wrote:
>> In the existing implementation, when a new macro is added, the whole
>> table has to be sorted again. Hence the cost of adding n macros is
>> worse than O(n^2), due to arithmetic progression.
>> This change drops all qsort(3) stuff altogether, by carefully preserving
>> table in sorted order. In findEntry routine, bsearch(3) is replaced
>> with customized binary search which tracks position for insertion.
>> In the addMacro routine, if a matching entry is not found, this
>> position is used for direct insertion, after the rest of the elements
>> are "shifted to the right" with memmove(3). Likewise, in delMacro
>> routine, the elements are shifted back to the left when the last macro
>> definition is popped. Technically, shifting half of the array with
>> memmove(3) is still O(n^2); however, modern CPUs process contiguous
>> memory in a very efficient manner, and glibc provides a fine-tuned
>> memmove(3) implementation.
>> Also, macro table entries are now allocated in a single chunk.
>> This change reduces rpm startup costs by factor of 6 (see callgrind
>> annotations below). Also, this change improves specfile parser
>> performance by a factor of 2 (e.g. the parse time of texlive.spec
>> is reduced from 67s to 35s).
> Impressive numbers. I wonder if thing could even improve further by
> switching to a hash table. RPM has it's own generic hash data type found
> in lib/rpmhash.H and lib/rpmhash.C
There's an assumption that the table must be sorted, because
rpmDumpMacroTable (used by 'rpm --showrc') prints entries in sorted
order. It is not immediately obvious whether this is actually
important, and hence whether the assumption can be dropped. Meanwhile,
I argue that the change is worth applying anyway, because it removes
qsort(3) stuff and makes use the underlying data structure in a more
concise and explicit manner - such that it will be easier to try
another data structure.
- "Find slot" logic is limited to findEntry as it used to be.
- "Push/Pop" logic is now limited to addMacro/delMacro in a rather
complementary manner (e.g. in its use of memmove).
The two remaining cases of explicit access to the data structure have
been also simplified:
- freeArgs traverses the table, in order to free local arguments such
as %1. (This can be actually reimplemented without traversal.)
- rpmFreeMacros deletes entries until the table is empty.
More information about the Rpm-maint