Add sift_up_rebalance(), an alternative to sift_down_root() that
halves the number of comparisons per extract-min.
The standard extract places the last array element at the root and
sifts it down. At each level this requires two comparisons (left
vs right child, then element vs winner) and a swap.
sift_up_rebalance() instead promotes the smaller child into the
root slot at each level — one comparison and one copy — until the
vacancy reaches a leaf. The last array element is placed at the
vacancy and sifted up to restore heap order. In practice the
sift-up rarely moves more than a level or two because the last
array element tends to be large.
Work per extract drops from 2d comparisons + d swaps to
d comparisons + d copies + a short sift-up.
prio_queue_get() now calls sift_up_rebalance() instead of placing
the last element at root and calling sift_down_root().
sift_down_root() and prio_queue_replace() are left unchanged.
Synthetic benchmark (10 rounds of 10M put+get cycles, CPU-pinned,
same compiler and Makefile flags):
Ascending keys (git's typical pattern — parents have lower
priority than children):
queue width baseline patched speedup
10 4.39s 3.91s 1.12x
100 9.10s 6.61s 1.38x
1,000 11.84s 9.25s 1.28x
10,000 17.50s 13.92s 1.26x
100,000 23.97s 20.19s 1.19x
Descending keys (worst case — last element always sinks to leaf):
queue width baseline patched speedup
10 4.94s 4.95s 1.00x
100 9.75s 9.42s 1.03x
1,000 15.01s 15.29s 0.98x
10,000 24.79s 23.88s 1.04x
100,000 29.69s 28.24s 1.05x
Random keys:
queue width baseline patched speedup
10 5.05s 4.99s 1.01x
100 9.90s 9.50s 1.04x
1,000 15.35s 14.77s 1.04x
10,000 25.35s 24.21s 1.05x
100,000 65.71s 63.38s 1.04x
No regressions in any scenario.
End-to-end benchmark on the linux kernel repo (1.4M commits,
range v5.0..v6.0, 311K commits, 20 interleaved runs, 1 warmup):
Command baseline patched speedup
rev-list --count v5.0..v6.0 484ms 474ms 1.02x
The improvement scales with DAG width: wider DAGs produce larger
priority queues, amplifying the per-level savings. In small or
narrow repositories the queues stay shallow and the sift-down
cost is already negligible.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function to replace the top element of the queue that basically
does the same as prio_queue_get() followed by prio_queue_put(), but
without the work by prio_queue_get() to rebalance the heap. It can be
used to optimize loops that get one element and then immediately add
another one. That's common e.g., with commit history traversal, where
we get out a commit and then put in its parents.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The alloc and nr fields of a prio-queue tell us how much memory is
allocated and used in the array. So the natural type for them is size_t,
which prevents overflow on 64-bit systems where "int" is still 32 bits.
This is unlikely to happen in practice, as we typically use it for
storing commits, and having 2^31 of those is rather a lot. But it's good
to keep our generic data structures as flexible as possible. And as we
start to enforce -Wsign-compare, it means that callers need to use
"int", too, and the problem proliferates. Let's fix it at the source.
The changes here can be put into a few groups:
1. Changing the alloc/nr fields in the struct to size_t. This requires
swapping out int for size_t in negotiator/skipping.c, as well as
in prio_queue_get(), because those all iterate over the array.
Building with -Wsign-compare complains about these.
2. Other code that assigns or passes around indexes into the array
(e.g., the swap() and compare() functions) won't trigger
-Wsign-compare because we are simply truncating the values. These
are caught by -Wconversion, but I've adjusted them here to
future-proof us.
3. In prio_queue_reverse() we compute "queue->nr - 1" without checking
if anything is in the queue, which underflows now that nr is
unsigned. We can fix that by returning early when the queue is
empty (there is nothing to reverse).
4. The insertion_ctr variable is currently unsigned, but can likewise
grow (it is actually worse, because adding and removing an element
many times will keep increasing the counter, even though "nr" does
not). I've bumped that to size_t here, as well.
But -Wconversion notes that computing the "cmp" result by
subtracting the counters and assigning to "int" is a potential
problem. And that's true even before this patch, since we use an
unsigned counter (imagine comparing "2^32-1" and "0", which should
be a high positive value, but instead is "-1" as a signed int).
Since we only care about the sign (and not the magnitude) of the
result, we could fix this by swapping out the subtraction for a
ternary comparison. Probably the performance impact would be
negligible, since we just called into a custom compare function and
branched on its result anyway. But it's easy enough to do a
branchless version by subtracting the comparison results.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When consuming a priority queue, it can be convenient to inspect
the next object that will be dequeued without actually dequeueing
it. Our existing library did not have such a 'peek' operation, so
add it as prio_queue_peek().
Add a reference-level comparison in t/helper/test-prio-queue.c
so this method is exercised by t0009-prio-queue.sh. Further, add
a test that checks the behavior when the compare function is NULL
(i.e. the queue becomes a stack).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace occurrences of `free(ptr); ptr = NULL` which weren't caught by
the coccinelle rule. These fall into two categories:
- free/NULL assignments one after the other which coccinelle all put
on one line, which is functionally equivalent code, but very ugly.
- manually spotted occurrences where the NULL assignment isn't right
after the free() call.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our array-reverse algorithm does the usual "walk from both
ends, swapping elements". We can quit when the two indices
are equal, since:
1. Swapping an element with itself is a noop.
2. If i and j are equal, then in the next iteration i is
guaranteed to be bigge than j, and we will exit the
loop.
So exiting the loop on equality is slightly more efficient.
And more importantly, the new SWAP() macro does not expect
to handle noop swaps; it will call memcpy() with the same src
and dst pointers in this case. It's unclear whether that
causes a problem on any platforms by violating the
"overlapping memory" constraint of memcpy, but it does cause
valgrind to complain.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the semantic patch swap.cocci to convert hand-rolled swaps to use
the macro SWAP. The resulting code is shorter and easier to read, the
object code is effectively unchanged.
The patch for object.c had to be hand-edited in order to preserve the
comment before the change; Coccinelle tried to eat it for some reason.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If two items are added to a prio_queue and compare equal,
they currently come out in an apparently random order (this
order is deterministic for a particular sequence of
insertions and removals, but does not necessarily match the
insertion order). This makes it unlike using a date-ordered
commit_list, which is one of the main types we would like to
replace with it (because prio_queue does not suffer from
O(n) insertions).
We can make the priority queue stable by keeping an
insertion counter for each element, and using it to break
ties. This does increase the memory usage of the structure
(one int per element), but in practice it does not seem to
affect runtime. A best-of-five "git rev-list --topo-order"
on linux.git showed less than 1% difference (well within the
run-to-run noise).
In an ideal world, we would offer both stable and unstable
priority queues (the latter to try to maximize performance).
However, given the lack of a measurable performance
difference, it is not worth the extra code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When manipulating the priority queue's heap, we frequently
have to compare and swap heap entries. As we are storing
only void pointers right now, this is quite easy to do
inline in a few lines. However, when we start using a more
complicated heap entry in a future patch, that will get
longer. Factoring out these operations lets us make future
changes in one place. It also makes the code a little
shorter and more readable.
Note that we actually accept indices into the queue array
instead of pointers. This is slightly less flexible than
passing pointers-to-pointers (we could not swap items from
unrelated arrays, but we would not want to), but will make
further refactoring simpler (and lets us avoid repeating
"queue->array" at each callsite, which led to some long
lines).
And finally, note that we are cleaning up an accidental use
of a "struct commit" pointer to hold a temporary entry
during swap. Even though we currently only use this code for
commits, it is supposed to be type-agnostic. In practice
this didn't matter anyway because we never dereferenced the
commit pointer (and on most systems, the pointer values
themselves are interchangeable between types).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the prio-queue data structure to implement a priority queue of
commits sorted by committer date, when handling --date-order. The
structure can also be used as a simple LIFO stack, which is a good
match for --topo-order processing.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally we used a singly linked list of commits to hold a set
of in-flight commits while traversing history. The most typical use
of the list is to add commits that are newly discovered to it, keep
the list sorted by commit timestamp, pick up the newest one from the
list, and keep digging. The cost of keeping the singly linked list
sorted is nontrivial, and this typical use pattern better matches a
priority queue.
Introduce a prio-queue structure, that can be used either as a LIFO
stack, or a priority queue. This will be used in the next patch to
hold in-flight commits during sort-in-topological-order.
Tests and the idea to make it usable for any "void *" pointers to
"things" are by Jeff King. Bugs are mine.
Signed-off-by: Junio C Hamano <gitster@pobox.com>