[Rpm-maint] [rpm-software-management/rpm] build: prioritize large packages (#1180)

Michal Domonkos notifications at github.com
Tue Apr 14 18:55:07 UTC 2020


Binary packages come in different sizes and so their build time can vary
greatly.  Dynamic scheduling, which we currently use for parallel
building, is a good strategy to combat such differences and load-balance
the available CPU cores.

That said, knowing that the build time of a package is proportional to
its size, we can reduce the overall time even further by cleverly
ordering the task queue.

As an example, consider a set of 5 packages, 4 of which take 1 unit of
time to build and one takes 4 units.  If we were to build these on a
dual-core system, one possible unit distribution would look like this:

            TIME --->
CPU 1       * * * * * *         # package 1, 3 and 5
CPU 2       * *                 # package 2 and 4

Now, compare that to a different distribution where the largest package
5 gets built early on:

            TIME --->
CPU 1       * * * *             # package 5
CPU 2       * * * *             # package 1, 2, 3 and 4

It's obvious that processing the largest packages first gives better
results when dealing with such a mix of small and large packages
(typically a regular package and its debuginfo counterpart,
respectively).

Now, with dynamic scheduling in OpenMP, we cannot directly control the
task queue; we can only generate the tasks and let the runtime system do
its work.  What we can do, however, is to provide a hint to the runtime
system for the desired ordering, using the "priority" clause.

So, in this commit, we use the clause to assign a priority value to each
build task based on the respective package size (the bigger the size,
the higher the priority), to help achieve an optimal execution order.

Indeed, in my testing, the priorities were followed to the letter (but
remember, that's not guaranteed by the specification).  Interestingly,
even without the use of priorities, simply generating the tasks in the
desired order resulted in the same execution order for me, but that's,
again, just an implementation detail.

Also note that OpenMP is allowed to stop the thread generating the tasks
at any time, and make it execute some of the tasks instead.  If the
chosen task happens to be a long-duration one, we might hit a starvation
scenario where the other threads have exhausted the task queue and
there's nobody to generate new tasks.  To counter that, this commit also
adds the "untied" clause which allows other threads to pick up where the
generating thread left off, and continue generating new tasks.

Resolves #211.
You can view, comment on, or merge this pull request online at:

  https://github.com/rpm-software-management/rpm/pull/1180

-- Commit Summary --

  * build: prioritize large packages

-- File Changes --

    M build/pack.c (30)

-- Patch Links --

https://github.com/rpm-software-management/rpm/pull/1180.patch
https://github.com/rpm-software-management/rpm/pull/1180.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/pull/1180
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rpm.org/pipermail/rpm-maint/attachments/20200414/3c54b565/attachment-0001.html>


More information about the Rpm-maint mailing list