[Rpm-maint] [PATCH] Improve macro table performance
Alexey Tourbin
alexey.tourbin at gmail.com
Wed Feb 13 18:34:40 UTC 2013
On Wed, Feb 13, 2013 at 12:59:02PM +0200, Panu Matilainen wrote:
> >There's an assumption that the table must be sorted, because
> >rpmDumpMacroTable (used by 'rpm --showrc') prints entries in sorted
> >order. It is not immediately obvious whether this is actually
> >important, and hence whether the assumption can be dropped.
>
> --showrc is just a diagnostic tool and its output order is mere
> cosmetics. If somebody cares enough, one could always sort the hash
> table keys for output, performance is irrelevant for that operation.
>
> >Meanwhile,
> >I argue that the change is worth applying anyway, because it removes
> >qsort(3) stuff and makes use the underlying data structure in a more
> >concise and explicit manner - such that it will be easier to try
> >another data structure.
>
> I haven't seriously looked but its entirely possible this would
> indeed make it easier to switch to a hash-table (which to me is
> clearly the most "natural" data structure for macros). In the
> meanwhile, this *removes* a fair amount of code and is much faster,
> what's not to like?
I have another reason for going with a sorted table: introducing bsearch
macro (see below).
> Well, a couple of things:
>
> > Also, macro table entries are now allocated in a single chunk.
>
> Not so much the change itself but the fact unrelated changes are
> combined into one commit. If you find yourself typing "also", "in
> addition" or the like into commit messages, most of the time its a
> sign that the change should go into a separate commit :)
Generally, yes. In another commit, I found myself grouping a few
changes together only to come up with a simpler commit subject, like
"Optimize readLine routine". This includes a few loosely coupled
optimizations to readLine and related static routines; these
optimizations can indeed be separated, and probably should be.
But sometimes there is a trade-off between separating independent
changes and grouping related changes into a single logical commit.
> Also the change as its done here obscures macro entry data structure
> somewhat:
>
> char *name; /*!< Macro name. */
> char *opts; /*!< Macro parameters (a la getopt) */
> int used; /*!< No. of expansions. */
> int level; /*!< Scoping level. */
> char body[]; /*!< Macro body. */
>
> "name" and "opts" look like they are individual pointers that should
> be freed separately while they're actually pointers to within
> body[]. If you want to go this way, use something like arena[] for
> the storage area and make name, opts and body separate "const char
> *" pointers into the arena so its clear one shouldn't go freeing
> them. Readability is more important than optimizing away one pointer
> per macro entry.
I argue that this data structure makes sense, and is probably optimal.
- Each macro has its own body, which is almost always non-empty (macros
with logically empty body are usually defined via %nil). This rules
out the possibility for sharing a body with another macro, and this
makes useless any further optimization for null/empty body. Thus a
pointer is not needed for body, and it makes sense to allocate the
body on behalf of flexible array member, at a known/fixed offset.
- Macro name can be shared: when pushed for the first time, the name is
allocated on behalf of flexible array, after the body. Otherwise, the
name is shared with the "underlying" macro. Hence the pointer is
inevitable.
- Options are even more tricky: simple macros have opts set to NULL;
macros with arguments like foo() have opts point to a statically
allocated string ""; finally, macros like foo(o:) have opts point into the
flexible array member, somewhere after the body.
Note that sharing macro name existed before, which indicates an
intention and a half-hearted attempt to cut down on malloc. I bring
this attempt to the logical conclusion: no further optimizations are
possible (in this respect).
As for arena[], only body[] is tightly coupled with the macro; name and
opts have a fair chance to point somewhere else, and it's none of
somebody's business where they actually point to. This is how the
arrangement can be understood, and it makes sense then to leave the
body[] as a flexible array member.
Hence I don't buy the argument that it is not clear whether the pointers
should be freed. There have already been some complications with
freeing a shared name, and these complications are now gone: a macro
entry is freed with a single free(me) call (and there's a comment which
ensures that "me" actually comes in a single chunk). Furthermore, the
only place where a macro needs to be freed is in delMacro(), and the
only place where the macro is allocated is addMacro(). All allocations
and all data structure manipulations are now local to these two
complementary routines. In the rest of the code, the macros and the
table are used essentially in read-only mode.
> So, please split it into two parts:
> - one to eliminate qsort() + bsearch()
> - one to change the macro entry allocation (and see above)
I'll recheck if it is possible to separate the data structure and the
allocation logic, but it seems that it is not, since the data structure
and the allocation are now tightly coupled. The skeleton of addMacro()
now looks like this:
findEntry
if (found)
allocate entry with shared name
else
prepare slot
allocate entry with flexible name
allocate/initialize the rest
insert into the slot
If it is not possible to produce two patches that commute (i.e. can be
applied in any order), it is probably best to go with a single patch.
> Also one cosmetic nit: in addMacro(), this "else" belongs to the
> line above with the closing brace:
> + else {
> + /* extend macro table */
> + const int delta = 256;
I usually prefer "uncuddled elses" (this comes from Perl, and the term
comes from perlstyle manpage). However, I'm willing to adjust to the
received style.
Now to the bsearch macro. I've been having some second thoughts about
the patch: is it okay to plug custom bsearch code like this? Bsearch
logic is pretty straightforward, but that's still quite a few lines,
and there are other bsearch calls throughout the code.
Do we even save much by inlining bsearch? On the one hand, since
bsearch iterates only log(n) times, which is roughly 10 (for 1024
elements), there might seem to be just too little stuff to cut down on.
On the other hand, it's been estimated that fully inlined bsearch/qsort
with very simple comparison (i.e. array of integers) is 4 times faster;
it can be 2 times faster on the array of short strings. So, on the
other hand, if you search routine is 2 times slower than it can be,
it's a great deal.
So I end up with trying to tailor a (reusable) BSEARCH macro, and there
are quite some details to get it right. It will eliminate findEntry()
function, since findEntry is completely reduced to bsearch, and gcc
already creates a few copies of findEntry due to constant argument
propagation.
More information about the Rpm-maint
mailing list