[Rpm-maint] [PATCH] Improve macro table performance
Panu Matilainen
pmatilai at laiskiainen.org
Thu Feb 14 18:38:17 UTC 2013
On 02/13/2013 08:34 PM, Alexey Tourbin wrote:
> On Wed, Feb 13, 2013 at 12:59:02PM +0200, Panu Matilainen wrote:
>>> There's an assumption that the table must be sorted, because
>>> rpmDumpMacroTable (used by 'rpm --showrc') prints entries in sorted
>>> order. It is not immediately obvious whether this is actually
>>> important, and hence whether the assumption can be dropped.
>>
>> --showrc is just a diagnostic tool and its output order is mere
>> cosmetics. If somebody cares enough, one could always sort the hash
>> table keys for output, performance is irrelevant for that operation.
>>
>>> Meanwhile,
>>> I argue that the change is worth applying anyway, because it removes
>>> qsort(3) stuff and makes use the underlying data structure in a more
>>> concise and explicit manner - such that it will be easier to try
>>> another data structure.
>>
>> I haven't seriously looked but its entirely possible this would
>> indeed make it easier to switch to a hash-table (which to me is
>> clearly the most "natural" data structure for macros). In the
>> meanwhile, this *removes* a fair amount of code and is much faster,
>> what's not to like?
>
> I have another reason for going with a sorted table: introducing bsearch
> macro (see below).
>
>> Well, a couple of things:
>>
>>> Also, macro table entries are now allocated in a single chunk.
>>
>> Not so much the change itself but the fact unrelated changes are
>> combined into one commit. If you find yourself typing "also", "in
>> addition" or the like into commit messages, most of the time its a
>> sign that the change should go into a separate commit :)
>
> Generally, yes. In another commit, I found myself grouping a few
> changes together only to come up with a simpler commit subject, like
> "Optimize readLine routine". This includes a few loosely coupled
> optimizations to readLine and related static routines; these
> optimizations can indeed be separated, and probably should be.
> But sometimes there is a trade-off between separating independent
> changes and grouping related changes into a single logical commit.
In my personal experience, every time I let myself or somebody else be
lazy with splitting changes, it comes back to bite me sooner or later :)
Of course there are exceptions to everything, and sometimes its simply
not possible to sanely split a large change.
>> Also the change as its done here obscures macro entry data structure
>> somewhat:
>>
>> char *name; /*!< Macro name. */
>> char *opts; /*!< Macro parameters (a la getopt) */
>> int used; /*!< No. of expansions. */
>> int level; /*!< Scoping level. */
>> char body[]; /*!< Macro body. */
>>
>> "name" and "opts" look like they are individual pointers that should
>> be freed separately while they're actually pointers to within
>> body[]. If you want to go this way, use something like arena[] for
>> the storage area and make name, opts and body separate "const char
>> *" pointers into the arena so its clear one shouldn't go freeing
>> them. Readability is more important than optimizing away one pointer
>> per macro entry.
>
> I argue that this data structure makes sense, and is probably optimal.
>
> - Each macro has its own body, which is almost always non-empty (macros
> with logically empty body are usually defined via %nil). This rules
> out the possibility for sharing a body with another macro, and this
> makes useless any further optimization for null/empty body. Thus a
> pointer is not needed for body, and it makes sense to allocate the
> body on behalf of flexible array member, at a known/fixed offset.
>
> - Macro name can be shared: when pushed for the first time, the name is
> allocated on behalf of flexible array, after the body. Otherwise, the
> name is shared with the "underlying" macro. Hence the pointer is
> inevitable.
>
> - Options are even more tricky: simple macros have opts set to NULL;
> macros with arguments like foo() have opts point to a statically
> allocated string ""; finally, macros like foo(o:) have opts point into the
> flexible array member, somewhere after the body.
>
> Note that sharing macro name existed before, which indicates an
> intention and a half-hearted attempt to cut down on malloc. I bring
> this attempt to the logical conclusion: no further optimizations are
> possible (in this respect).
>
> As for arena[], only body[] is tightly coupled with the macro; name and
> opts have a fair chance to point somewhere else, and it's none of
> somebody's business where they actually point to. This is how the
> arrangement can be understood, and it makes sense then to leave the
> body[] as a flexible array member.
>
> Hence I don't buy the argument that it is not clear whether the pointers
> should be freed. There have already been some complications with
> freeing a shared name, and these complications are now gone: a macro
> entry is freed with a single free(me) call (and there's a comment which
> ensures that "me" actually comes in a single chunk). Furthermore, the
> only place where a macro needs to be freed is in delMacro(), and the
> only place where the macro is allocated is addMacro(). All allocations
> and all data structure manipulations are now local to these two
> complementary routines. In the rest of the code, the macros and the
> table are used essentially in read-only mode.
Yes I see what it does, after staring it for a bit. My initial reaction
however was:
- Mm, isn't this leaking memory from opts/name?
- No... but then where are they stored?
- Pointer into body? Wtf has the body got to do with this?
But perhaps you misunderstood what I meant: I'm not complaining about
the way things are stored in one malloc blob, this is just a naming /
clarity thing. On top of your patch:
--- a/rpmio/macro.c
+++ b/rpmio/macro.c
@@ -37,11 +37,12 @@ extern int optind;
/*! The structure used to store a macro. */
struct rpmMacroEntry_s {
struct rpmMacroEntry_s *prev;/*!< Macro entry stack. */
- char *name; /*!< Macro name. */
- char *opts; /*!< Macro parameters (a la getopt) */
+ const char *name; /*!< Macro name. */
+ const char *opts; /*!< Macro parameters (a la getopt). */
+ const char *body; /*!< Macro body. */
int used; /*!< No. of expansions. */
int level; /*!< Scoping level. */
- char body[]; /*!< Macro body. */
+ char arena[]; /*!< String arena. */
};
/*! The structure used to store the set of macros in a context. */
@@ -1339,7 +1340,7 @@ addMacro(rpmMacroContext mc,
/* entry with shared name */
me = xmalloc(mesize);
/* copy body */
- p = me->body;
+ me->body = p = me->arena;
if (blen)
memcpy(p, b, blen + 1);
else
@@ -1363,7 +1364,7 @@ addMacro(rpmMacroContext mc,
size_t nlen = strlen(n);
me = xmalloc(mesize + nlen + 1);
/* copy body */
- p = me->body;
+ me->body = p = me->arena;
if (blen)
memcpy(p, b, blen + 1);
else
That way name, body and opts become self-explanatory by just looking at
the struct declaration: they're const so they're pointers to "somewhere
else, arena probably" and must not be freed or otherwise messed with,
and the compiler will complain if you try to do so.
>> So, please split it into two parts:
>> - one to eliminate qsort() + bsearch()
>> - one to change the macro entry allocation (and see above)
>
> I'll recheck if it is possible to separate the data structure and the
> allocation logic, but it seems that it is not, since the data structure
> and the allocation are now tightly coupled. The skeleton of addMacro()
> now looks like this:
>
> findEntry
> if (found)
> allocate entry with shared name
> else
> prepare slot
> allocate entry with flexible name
> allocate/initialize the rest
> insert into the slot
>
> If it is not possible to produce two patches that commute (i.e. can be
> applied in any order), it is probably best to go with a single patch.
On the surface it seems like it shouldn't be particularly hard to split,
but I didn't look at it that carefully and might well be missing
something. It's not the end of the world or worth putting a huge effort
into, but if its *reasonably* splittable, please do.
>> Also one cosmetic nit: in addMacro(), this "else" belongs to the
>> line above with the closing brace:
>> + else {
>> + /* extend macro table */
>> + const int delta = 256;
>
> I usually prefer "uncuddled elses" (this comes from Perl, and the term
> comes from perlstyle manpage). However, I'm willing to adjust to the
> received style.
>
> Now to the bsearch macro. I've been having some second thoughts about
> the patch: is it okay to plug custom bsearch code like this? Bsearch
> logic is pretty straightforward, but that's still quite a few lines,
> and there are other bsearch calls throughout the code.
>
> Do we even save much by inlining bsearch? On the one hand, since
> bsearch iterates only log(n) times, which is roughly 10 (for 1024
> elements), there might seem to be just too little stuff to cut down on.
> On the other hand, it's been estimated that fully inlined bsearch/qsort
> with very simple comparison (i.e. array of integers) is 4 times faster;
> it can be 2 times faster on the array of short strings. So, on the
> other hand, if you search routine is 2 times slower than it can be,
> it's a great deal.
>
> So I end up with trying to tailor a (reusable) BSEARCH macro, and there
> are quite some details to get it right. It will eliminate findEntry()
> function, since findEntry is completely reduced to bsearch, and gcc
> already creates a few copies of findEntry due to constant argument
> propagation.
I wouldn't worry about the added lines too much, as despite the custom
bsearch() this patch actually reduces the lines of code, and is faster.
And since it might be replaced with a hash table in not-too-distant
future...
- Panu -
More information about the Rpm-maint
mailing list