'git branch --contains' and 'git for-each-ref --contains' have
been optimized to use the memoized commit traversal previously
used only by 'git tag --contains', significantly speeding up
connectivity checks across many candidate refs with shared
history.
* td/ref-filter-memoize-contains:
commit-reach: die on contains walk errors
ref-filter: memoize --contains with generations
commit-reach: reject cycles in contains walk
The lazy priority queue optimization pattern (deferring actual removal
in prio_queue_get() to allow get+put fusion) has been folded directly
into prio_queue itself, speeding up commit traversal workflows and
simplifying callers.
* kk/prio-queue-get-put-fusion:
prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
prio-queue: rename .nr to .nr_ and add accessor helpers
Without generation numbers, repo_is_descendant_of() can return -1 when
it cannot read commit ancestry. commit_contains() exposes that result
through a Boolean interface, so ref-filter treats it as true. This can
include a ref for --contains or exclude it for --no-contains without
failing the command.
Die when repo_is_descendant_of() reports an error. The memoized walk
already dies when it cannot parse a commit, so callers of the
non-memoized path no longer turn a failed walk into a match.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git branch and git for-each-ref run a separate reachability walk for
each ref considered by --contains and --no-contains. Refs with shared
history therefore traverse the same commits repeatedly.
git tag instead uses a depth-first walk that caches results across
refs. That walk can perform poorly without generation numbers: a
negative check may walk to the root instead of stopping at a nearby
divergence. Generation numbers let it stop below the oldest target.
Use the memoized walk for all ref-filter callers when generation
numbers are available. Keep git tag on its existing path without
generations. Caching still helps when many tags share deep history:
ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) reduced
git tag --contains HEAD~200 in linux-2.6 from 15.417 to 5.329 seconds.
The new shared-history perf test improves from 0.72 to 0.03 seconds. In
a repository with 62,174 remote-tracking refs, running:
git branch -r --contains c78ae85f3ce7e
improves from 104.365 seconds to 468 milliseconds.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The memoized contains traversal used by git tag assumes that commit
ancestry is acyclic. Replacement refs can violate that assumption,
causing it to keep pushing an already active commit until memory is
exhausted.
Mark commits while they are active and die if the traversal encounters
an active commit. Other failures in this walk already die through
parse_commit_or_die(); using a second reachability walk would only add
a separate policy for malformed history.
Suggested-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_reachable_subset() and tips_reachable_from_bases() both answer
the same reachability question but use different traversal
strategies: priority queue vs depth-first search. Consolidate them
into tips_reachable_from_bases() with a mode parameter to select
between DFS and PQ traversal, preserving the preferred strategy for
each caller.
This works cleanly because prio_queue already supports LIFO mode
(when compare is NULL), so a single prio_queue acts as either a
stack or a heap depending on the mode.
The unified traversal pushes all unseen parents at once rather than
peeking and pushing one parent at a time. This eliminates merge
commit revisits entirely: a 2-parent merge now requires 1 visit
instead of 3. For DFS (LIFO) mode, the first parent is pushed
last so it ends up on top of the stack, preserving first-parent
traversal order.
Parsing is deferred to pop time for DFS since parent objects carry
valid flags without a full repo_parse_commit() call. PQ mode
parses before push so the heap can order by generation number.
Add exhaustive reachability tests that use every commit in the
grid as a tip, protecting against subtle traversal bugs such as
wrong parent ordering or premature pruning. The existing tests
are also extended to exercise both DFS and PQ modes.
The flag in remote.c changes from 1 (bit 0) to TMP_MARK (bit 4)
because tips_reachable_from_bases() uses SEEN (bit 0) internally.
TMP_MARK is already used for deduplication earlier in the same
function and is cleared before the reachability check.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the .nr member to .nr_ so that callers outside prio-queue.c
that directly reference .nr get a compilation error. This catches
both existing misuse and future in-flight topics.
Add prio_queue_size() for callers that need to know the element count
and prio_queue_for_each() for callers that need to walk all elements.
Convert all external .nr users:
- Loop conditions: use prio_queue_size(), prio_queue_get(), or
prio_queue_peek() as the loop condition
- Array iterations: use prio_queue_for_each()
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check for non-stale commits in the priority queue used by
`paint_down_to_common` and `ahead_behind` has been optimized by
replacing an O(N) scan with an O(1) counter, yielding performance
improvements in repositories with wide histories.
* kk/commit-reach-optim:
commit-reach: replace queue_has_nonstale() scan with O(1) tracking
commit-reach: deduplicate queue entries in paint_down_to_common
object.h: fix stale entries in object flag allocation table
Revision traversal optimization.
* kk/tips-reachable-from-bases-optim:
t6600: add tests for duplicate tips in tips_reachable_from_bases()
commit-reach: use object flags for tips_reachable_from_bases()
paint_down_to_common() and ahead_behind() call queue_has_nonstale()
on every iteration to decide whether to continue the walk.
queue_has_nonstale() performs a linear scan of the priority queue,
making the overall walk O(n*m) where n is the number of commits
walked and m is the queue size.
Introduce 'struct nonstale_queue', a thin wrapper around prio_queue
that maintains a 'max_nonstale' pointer — the lowest-priority
(oldest) non-stale commit seen so far. When this commit is popped,
every remaining queue entry is known to be stale, so the walk can
stop. This reduces the per-iteration termination check from O(m)
to O(1).
Uses <= 0 (not < 0) when comparing priorities so that among distinct
commits with equal priority (same generation and timestamp) the
last-enqueued one is tracked. Since prio_queue breaks ties by
insertion order, this ensures max_nonstale is always the last in its
priority class to be popped, making pointer equality on pop
sufficient for correctness.
The previous commit's ENQUEUED deduplication guarantees each commit
appears at most once in the queue, which is required for the pointer
equality check to be unambiguous.
On a large monorepo (3.7M commits), this yields ~2x end-to-end
speedup for merge-base calculations on deep import branches.
Profiling shows paint_down_to_common() drops from 50% to 4% of
total runtime (~27x faster), with the remaining time in commit
graph lookups and heap operations:
Before: 8536ms / 5757ms / 4743ms (three test cases)
After: 3956ms / 4383ms / 1927ms
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
paint_down_to_common() can enqueue the same commit multiple times
when it is reached through different parents with different flag
combinations. Add an ENQUEUED flag to track whether a commit is
currently in the priority queue, and skip it if already present.
Introduce prio_queue_put_dedup() and prio_queue_get_dedup()
wrappers that manage the ENQUEUED flag on enqueue and dequeue.
This change is performance-neutral on its own: the O(n)
queue_has_nonstale() scan still dominates the per-iteration cost.
However, the deduplication guarantee (each commit appears in the
queue at most once) is a prerequisite for the next commit, which
replaces that scan with O(1) tracking.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
tips_reachable_from_bases() walks the commit graph from a set of base
commits to find which tip commits are reachable. The inner loop does
a linear scan over the tips array to check whether each visited commit
is a tip, making the overall cost O(C * T) where C is commits walked
and T is the number of tips.
Use the RESULT object flag to mark tip commits, replacing the linear
scan with a single flag test per visited commit. This reduces the
per-commit tip check from O(T) to O(1) and the overall cost from
O(C * T) to O(C + T).
When multiple refs point to the same commit, the shared object gets
the flag once, so all duplicates are handled automatically. The
early-termination advancement loop checks the flag on the sorted
commits array directly, which naturally handles duplicates since the
flag is on the shared commit object.
This also removes the index field from struct commit_and_index, since
the indirection through the original tips array is no longer needed.
This function is called by `git for-each-ref --merged` and
`git branch/tag --contains/--no-contains` via reach_filter() in
ref-filter.c.
Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs):
Command Before After Speedup
for-each-ref --merged HEAD 6.57s 1.59s 4.1x
for-each-ref --no-merged HEAD 6.67s 1.66s 4.0x
branch --merged HEAD 0.68s 0.61s 10%
branch --no-merged HEAD 0.65s 0.61s 8%
tag --merged HEAD 0.12s 0.12s -
On linux.git with 10,000 synthetic branches at the root commit (worst
case for the DFS walk):
Command Before After Speedup
for-each-ref --merged HEAD 1.35s 0.35s 3.9x
for-each-ref --no-merged HEAD 1.82s 0.31s 5.9x
The large speedup for for-each-ref is because it checks all 10,000
refs as tips, making the O(T) inner loop expensive. The branch
subcommand only checks local branches (fewer tips), so the improvement
is smaller.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits not in the commit-graph get GENERATION_NUMBER_INFINITY and
sort to the top of the priority queue. After those, commits with
finite generation numbers are popped in non-increasing order.
When MERGE_BASE_FIND_ALL is not set the first doubly-painted commit
with a finite generation is therefore a best merge-base: no commit
still in the queue can be a descendant of it. Skip the expensive
STALE drain in this case.
Add MERGE_BASE_FIND_ALL to the merge_base_flags enum. Callers that
need every merge-base (repo_get_merge_bases_many, repo_get_merge_bases,
repo_in_merge_bases_many, remove_redundant_no_gen) pass the flag to
preserve existing behavior. git merge-base (without --all) passes 0,
triggering the early exit.
On a 2.2M-commit merge-heavy monorepo with commit-graph:
HEAD vs ~500: 5,229ms -> 24ms
HEAD vs ~1000: 4,214ms -> 39ms
HEAD vs ~5000: 3,799ms -> 46ms
HEAD vs ~10000: 3,827ms -> 61ms
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the boolean ignore_missing_commits parameter in
paint_down_to_common() with an enum merge_base_flags, and thread
the flags through merge_bases_many(), get_merge_bases_many_0(),
and the public repo_get_merge_bases_many_dirty() API.
This makes callsites with boolean parameters easier to read and
prepares the function for additional flags in a subsequent commit.
No functional change: the single caller that used
ignore_missing_commits (repo_in_merge_bases_many) now sets
MERGE_BASE_IGNORE_MISSING_COMMITS in the flags word, and all
other callers pass 0.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't bother extracting the last few remaining prio_queue items in
order when we only want to free their associated bitmaps; just iterate
over the item array.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `free_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity. Avoid it by just appending in the loop and sorting at
the end.
The number of merge bases is usually small, so don't expect speedups in
normal repositories. It has no limit, though. The added perf test
shows a nice improvement when dealing with 16384 merge bases:
Test v2.51.1 HEAD
-----------------------------------------------------------------
6010.2: git merge-base 0.55(0.54+0.00) 0.03(0.02+0.00) -94.5%
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `repo_get_merge_bases_many()` and friends accepts an array
of commits as well as a parameter that indicates how large that array
is. This parameter is using a signed integer, which leads to a couple of
warnings with -Wsign-compare.
Refactor the code to use `size_t` to track indices instead and adapt
callers accordingly. While most callers are trivial, there are two
callers that require a bit more scrutiny:
- builtin/merge-base.c:show_merge_base() subtracts `1` from the
`rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if
the variable was `0` it would wrap. This code is fine though because
its only caller will execute that code only when `argc >= 2`, and it
follows that `rev_nr >= 2`, as well.
- bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`.
Again, there is only a single caller that populates `rev_nr` with
`good_revs.nr`. And because a bisection always requires at least one
good revision it follws that `rev_nr >= 1`.
Mark the file as -Wsign-compare-clean.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, adapt `get_reachable_subset()` so
that it tracks array indices via `size_t` instead of using signed
integers to fix a couple of -Wsign-compare warnings. Adapt callers
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `remove_redundant()` gets as input an array of commits as
well as the size of that array and then drops redundant commits from
that array. It then returns either `-1` in case an error occurred, or
the new number of items in the array.
The function receives and returns these sizes with a signed integer,
which causes several warnings with -Wsign-compare. Fix this issue by
consistently using `size_t` to track array indices and splitting up
the returned value into a returned error code and a separate out pointer
for the new computed size.
Note that `get_merge_bases_many()` and related functions still track
array sizes as a signed integer. This will be fixed in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `can_all_from_reach_with_flag()` function accepts a parameter that
allows callers to cut off traversal at a specified commit date. This
parameter is of type `time_t`, which is a signed type, while we end up
comparing it to a commit's `date` field, which is of the unsigned type
`timestamp_t`.
Fix the parameter to be of type `timestamp_t`. There is only a single
caller in "upload-pack.c" that sets this parameter, and that caller
knows to pass in a `timestamp_t` already.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 62e745ced2 (prio-queue: use size_t rather than int for size,
2024-12-20), we refactored `struct prio_queue` to track the number of
contained entries via a `size_t`. While the refactoring adapted one of
the users of that variable, it forgot to also adapt "commit-reach.c"
accordingly. This was missed because that file has -Wsign-conversion
disabled.
Fix the issue by using a `size_t` to iterate through entries.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.
* ds/for-each-ref-is-base:
p1500: add is-base performance tests
for-each-ref: add 'is-base' token
commit: add gentle reference lookup method
commit-reach: add get_branch_base_for_tip
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.
In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing). Even in this case, the previous commit commonly appears in the
first-parent history of the branch.
Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.
With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:
Among a list of candidate base branches, select the candidate that
minimizes the number of commits in the first-parent history of the source
that are not in the first-parent history of the candidate.
Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.
Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command
git rev-list --count --first-parent <base>..<source>
does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.
In extremely large repositories, merging into a single trunk may not be
feasible. This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build. These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa. In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.
If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.
For this reason, the first-parent condition should be symmetrical across the
base and source branches.
Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.
To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it. When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.
* There are up to two walks, since we require all commits to have a computed
generation number in order to avoid incorrect results. This is similar to
the need for computed generation numbers in ahead_behind() as implemented
in fd67d149bd (commit-reach: implement ahead_behind() logic, 2023-03-20).
In order to track the "best" base, use a new commit slab that stores an
integer. This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.
The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index. The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.
Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't free the local `stack` commit list that we use to compute
reachability of multiple commits at once. Do so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.
It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.
Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit
For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).
Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use a priority queue in `ahead_behind()` to compute the ahead/behind
count for commits. We may not iterate through all commits part of that
queue though in case all of its entries are stale. Consequently, as we
never make the effort to release the remaining commits, we end up
leaking bit arrays that we have allocated for each of the contained
commits.
Plug this leak and mark the corresponding test as leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
(Actually, this commit is only about passing on "missing commits"
errors, but adding that to the commit's title would have made it too
long.)
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases_many_dirty()` function is
aware of that, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases_many()` function is aware of
that, too.
Naturally, there are a lot of callers that need to be adjusted now, too.
Next stop: `repo_get_merge_bases_dirty()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases()` function (which is also
surfaced via the `get_merge_bases()` macro) is aware of that, too.
Naturally, the callers need to be adjusted now, too.
Next step: adjust `repo_get_merge_bases_many()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases()` function (which is also
surfaced via the `repo_get_merge_bases()` macro) is aware of that, too.
Naturally, there are a lot of callers that need to be adjusted now, too.
Next step: adjust the callers of `get_octopus_merge_bases()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `merge_bases_many()` function was just taught to indicate
parsing errors, and now the `get_merge_bases_many_0()` function is aware
of that, too.
Next step: adjust the callers of `get_merge_bases_many_0()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `paint_down_to_common()` function was just taught to indicate
parsing errors, and now the `merge_bases_many()` function is aware of
that, too.
One tricky aspect is that `merge_bases_many()` parses commits of its
own, but wants to gracefully handle the scenario where NULL is passed as
a merge head, returning the empty list of merge bases. The way this was
handled involved calling `repo_parse_commit(NULL)` and relying on it to
return an error. This has to be done differently now so that we can
handle missing commits correctly by producing a fatal error.
Next step: adjust the caller of `merge_bases_many()`:
`get_merge_bases_many_0()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a commit cannot be parsed, it is currently ignored when looking for
merge bases. That's undesirable as the operation can pretend success in
a corrupt repository, even though the command should fail with an error
message.
Let's start at the bottom of the stack by teaching the
`paint_down_to_common()` function to return an `int`: if negative, it
indicates fatal error, if 0 success.
This requires a couple of callers to be adjusted accordingly.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `git fetch --update-shallow` needs to test for commit ancestry, it
can naturally run into a missing object (e.g. if it is a parent of a
shallow commit). For the purpose of `--update-shallow`, this needs to be
treated as if the child commit did not even have that parent, i.e. the
commit history needs to be clamped.
For all other scenarios, clamping the commit history is actually a bug,
as it would hide repository corruption (for an analysis regarding
shallow and partial clones, see the analysis further down).
Add a flag to optionally ask the function to ignore missing commits, as
`--update-shallow` needs it to, while detecting missing objects as a
repository corruption error by default.
This flag is needed, and cannot be replaced by `is_repository_shallow()`
to indicate that situation, because that function would return 0 in the
`--update-shallow` scenario: There is not actually a `shallow` file in
that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
with receive.updateshallow on") and t5538.4 ("add new shallow root with
receive.updateshallow on").
Note: shallow commits' parents are set to `NULL` internally already,
therefore there is no need to special-case shallow repositories here, as
the merge-base logic will not try to access parent commits of shallow
commits.
Likewise, partial clones aren't an issue either: If a commit is missing
during the revision walk in the merge-base logic, it is fetched via
`promisor_remote_get_direct()`. And not only the single missing commit
object: Due to the way the "promised" objects are fetched (in
`fetch_objects()` in `promisor-remote.c`, using `fetch
--filter=blob:none`), there is no actual way to fetch a single commit
object, as the remote side will pass that commit OID to `pack-objects
--revs [...]` which in turn passes it to `rev-list` which interprets
this as a commit _range_ instead of a single object. Therefore, in
partial clones (unless they are shallow in addition), all commits
reachable from a commit that is in the local object database are also
present in that local database.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some functions in Git's source code follow the convention that returning
a negative value indicates a fatal error, e.g. repository corruption.
Let's use this convention in `repo_in_merge_bases()` to report when one
of the specified commits is missing (i.e. when `repo_parse_commit()`
reports an error).
Also adjust the callers of `repo_in_merge_bases()` to handle such
negative return values.
Note: As of this patch, errors are returned only if any of the specified
merge heads is missing. Over the course of the next patches, missing
commits will also be reported by the `paint_down_to_common()` function,
which is called by `repo_in_merge_bases_many()`, and those errors will
be properly propagated back to the caller at that stage.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently this function treats unrelated commit histories the same way
as commit histories with missing commit objects.
Typically, missing commit objects constitute a corrupt repository,
though, and should be reported as such. The next commits will make it
so, but there is one exception: In `git fetch --update-shallow` we
_expect_ commit objects to be missing, and we do want to treat the
now-incomplete commit histories as unrelated.
To allow for that, let's introduce an additional parameter that is
passed to `repo_in_merge_bases_many()` to trigger this behavior, and use
it in the two callers in `shallow.c`.
This commit changes behavior slightly: unless called from the
`shallow.c` functions that set the `ignore_missing_commits` bit, any
non-existing tip commit that is passed to `repo_in_merge_bases_many()`
will now result in an error.
Note: When encountering missing commits while traversing the commit
history in search for merge bases, with this commit there won't be a
change in behavior just yet, their children will still be interpreted as
root commits. This bug will get fixed by follow-up commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a commit is missing, we return early (currently pretending that no
merge basis could be found in that case). At that stage, it is possible
that a merge base could have been found already, and added to the
`result`, which is now leaked.
The priority queue has a similar issue: There might still be a commit in
that queue.
Let's release both, to address the potential memory leaks.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We loop over the set of commits to merge, and for each one compute the
merge base against the existing set of merge base candidates we've
found. Then we replace the candidate set with a simple assignment of the
list head, leaking the old list. We should free it first before
assignment.
This makes t5521 leak-free, so mark it as such.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a leak that has existed since the method was first created
in fcb2c0769d (commit-reach: implement get_reachable_subset,
2018-11-02).
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up around the use of the_repository.
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
"git for-each-ref" learns '%(ahead-behind:<base>)' that computes the
distances from a single reference point in the history with bunch
of commits in bulk.
* ds/ahead-behind:
commit-reach: add tips_reachable_from_bases()
for-each-ref: add ahead-behind format atom
commit-reach: implement ahead_behind() logic
commit-graph: introduce `ensure_generations_valid()`
commit-graph: return generation from memory
commit-graph: simplify compute_generation_numbers()
commit-graph: refactor compute_topological_levels()
for-each-ref: explicitly test no matches
for-each-ref: add --stdin option
As can easily be seen from grepping in our sources, we had these uses
of "the_repository" in various library code in cases where the
function in question was already getting a "struct repository *"
argument. Let's use that argument instead.
Out of these changes only the changes to "cache-tree.c",
"commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly
applied before the migration away from the "repo_*()" wrapper macros
in the preceding commits.
The rest aren't new, as we'd previously implicitly refer to
"the_repository", but it's now more obvious that we were doing the
wrong thing all along, and should have used the parameter instead.
The change to change "get_index_format_default(the_repository)" in
"read-cache.c" to use the "r" variable instead should arguably have
been part of [1], or in the subsequent cleanup in [2]. Let's do it
here, as can be seen from the initial code in [3] it's not important
that we use "the_repository" there, but would prefer to always use the
current repository.
This change excludes the "the_repository" use in "upload-pack.c"'s
upload_pack_advertise(), as the in-flight [4] makes that change.
1. ee1f0c242e (read-cache: add index.skipHash config option,
2023-01-06)
2. 6269f8eaad (treewide: always have a valid "index_state.repo"
member, 2023-01-17)
3. 7211b9e753 (repo-settings: consolidate some config settings,
2019-08-13)
4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the part of "the_repository.pending.cocci" pertaining to
"commit.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>