git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-06-14 05:33:00 -05:00

Author	SHA1	Message	Date
Junio C Hamano	17a3894641	Merge branch 'td/ref-filter-memoize-contains' into seen 'git branch --contains' and 'git for-each-ref --contains' have been optimized to use the memoized commit traversal previously used only by 'git tag --contains', significantly speeding up connectivity checks across many candidate refs with shared history. * td/ref-filter-memoize-contains: commit-reach: die on contains walk errors ref-filter: memoize --contains with generations commit-reach: reject cycles in contains walk	2026-06-12 15:58:18 -07:00
Junio C Hamano	8219a17edb	Merge branch 'kk/remove-get-reachable-subset' into seen API clean-up. * kk/remove-get-reachable-subset: commit-reach: remove get_reachable_subset()	2026-06-12 15:58:18 -07:00
Junio C Hamano	08b6c12590	Merge branch 'kk/prio-queue-get-put-fusion' into seen The lazy priority queue optimization pattern (deferring actual removal in prio_queue_get() to allow get+put fusion) has been folded directly into prio_queue itself, speeding up commit traversal workflows and simplifying callers. * kk/prio-queue-get-put-fusion: prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion prio-queue: rename .nr to .nr_ and add accessor helpers	2026-06-12 15:58:18 -07:00
Tamir Duberstein	ff3ba7d158	commit-reach: die on contains walk errors Without generation numbers, repo_is_descendant_of() can return -1 when it cannot read commit ancestry. commit_contains() exposes that result through a Boolean interface, so ref-filter treats it as true. This can include a ref for --contains or exclude it for --no-contains without failing the command. Die when repo_is_descendant_of() reports an error. The memoized walk already dies when it cannot parse a commit, so callers of the non-memoized path no longer turn a failed walk into a match. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-12 15:56:44 -07:00
Tamir Duberstein	ca96eeee79	ref-filter: memoize --contains with generations git branch and git for-each-ref run a separate reachability walk for each ref considered by --contains and --no-contains. Refs with shared history therefore traverse the same commits repeatedly. git tag instead uses a depth-first walk that caches results across refs. That walk can perform poorly without generation numbers: a negative check may walk to the root instead of stopping at a nearby divergence. Generation numbers let it stop below the oldest target. Use the memoized walk for all ref-filter callers when generation numbers are available. Keep git tag on its existing path without generations. Caching still helps when many tags share deep history: `ffc4b8012d` (tag: speed up --contains calculation, 2011-06-11) reduced git tag --contains HEAD~200 in linux-2.6 from 15.417 to 5.329 seconds. The new shared-history perf test improves from 0.72 to 0.03 seconds. In a repository with 62,174 remote-tracking refs, running: git branch -r --contains c78ae85f3ce7e improves from 104.365 seconds to 468 milliseconds. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-12 15:56:44 -07:00
Tamir Duberstein	5bd39784cd	commit-reach: reject cycles in contains walk The memoized contains traversal used by git tag assumes that commit ancestry is acyclic. Replacement refs can violate that assumption, causing it to keep pushing an already active commit until memory is exhausted. Mark commits while they are active and die if the traversal encounters an active commit. Other failures in this walk already die through parse_commit_or_die(); using a second reachability walk would only add a separate policy for malformed history. Suggested-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-12 15:56:44 -07:00
Kristofer Karlsson	9305e559d2	commit-reach: remove get_reachable_subset() get_reachable_subset() and tips_reachable_from_bases() both answer the same reachability question but use different traversal strategies: priority queue vs depth-first search. Consolidate them into tips_reachable_from_bases() with a mode parameter to select between DFS and PQ traversal, preserving the preferred strategy for each caller. This works cleanly because prio_queue already supports LIFO mode (when compare is NULL), so a single prio_queue acts as either a stack or a heap depending on the mode. The unified traversal pushes all unseen parents at once rather than peeking and pushing one parent at a time. This eliminates merge commit revisits entirely: a 2-parent merge now requires 1 visit instead of 3. For DFS (LIFO) mode, the first parent is pushed last so it ends up on top of the stack, preserving first-parent traversal order. Parsing is deferred to pop time for DFS since parent objects carry valid flags without a full repo_parse_commit() call. PQ mode parses before push so the heap can order by generation number. Add exhaustive reachability tests that use every commit in the grid as a tip, protecting against subtle traversal bugs such as wrong parent ordering or premature pruning. The existing tests are also extended to exercise both DFS and PQ modes. The flag in remote.c changes from 1 (bit 0) to TMP_MARK (bit 4) because tips_reachable_from_bases() uses SEEN (bit 0) internally. TMP_MARK is already used for deduplication earlier in the same function and is cleared before the reachability check. Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-11 06:09:44 -07:00
Kristofer Karlsson	3c57836988	prio-queue: rename .nr to .nr_ and add accessor helpers Rename the .nr member to .nr_ so that callers outside prio-queue.c that directly reference .nr get a compilation error. This catches both existing misuse and future in-flight topics. Add prio_queue_size() for callers that need to know the element count and prio_queue_for_each() for callers that need to walk all elements. Convert all external .nr users: - Loop conditions: use prio_queue_size(), prio_queue_get(), or prio_queue_peek() as the loop condition - Array iterations: use prio_queue_for_each() Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-09 11:11:46 -07:00
Junio C Hamano	6390da42c7	Merge branch 'kk/commit-reach-optim' The check for non-stale commits in the priority queue used by `paint_down_to_common` and `ahead_behind` has been optimized by replacing an O(N) scan with an O(1) counter, yielding performance improvements in repositories with wide histories. * kk/commit-reach-optim: commit-reach: replace queue_has_nonstale() scan with O(1) tracking commit-reach: deduplicate queue entries in paint_down_to_common object.h: fix stale entries in object flag allocation table	2026-06-07 23:58:25 +09:00
Junio C Hamano	a0ce168def	Merge branch 'kk/tips-reachable-from-bases-optim' Revision traversal optimization. * kk/tips-reachable-from-bases-optim: t6600: add tests for duplicate tips in tips_reachable_from_bases() commit-reach: use object flags for tips_reachable_from_bases()	2026-05-31 10:00:37 +09:00
Kristofer Karlsson	a186b7797a	commit-reach: replace queue_has_nonstale() scan with O(1) tracking paint_down_to_common() and ahead_behind() call queue_has_nonstale() on every iteration to decide whether to continue the walk. queue_has_nonstale() performs a linear scan of the priority queue, making the overall walk O(n*m) where n is the number of commits walked and m is the queue size. Introduce 'struct nonstale_queue', a thin wrapper around prio_queue that maintains a 'max_nonstale' pointer — the lowest-priority (oldest) non-stale commit seen so far. When this commit is popped, every remaining queue entry is known to be stale, so the walk can stop. This reduces the per-iteration termination check from O(m) to O(1). Uses <= 0 (not < 0) when comparing priorities so that among distinct commits with equal priority (same generation and timestamp) the last-enqueued one is tracked. Since prio_queue breaks ties by insertion order, this ensures max_nonstale is always the last in its priority class to be popped, making pointer equality on pop sufficient for correctness. The previous commit's ENQUEUED deduplication guarantees each commit appears at most once in the queue, which is required for the pointer equality check to be unambiguous. On a large monorepo (3.7M commits), this yields ~2x end-to-end speedup for merge-base calculations on deep import branches. Profiling shows paint_down_to_common() drops from 50% to 4% of total runtime (~27x faster), with the remaining time in commit graph lookups and heap operations: Before: 8536ms / 5757ms / 4743ms (three test cases) After: 3956ms / 4383ms / 1927ms Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-26 07:18:04 +09:00
Kristofer Karlsson	f767dae3e6	commit-reach: deduplicate queue entries in paint_down_to_common paint_down_to_common() can enqueue the same commit multiple times when it is reached through different parents with different flag combinations. Add an ENQUEUED flag to track whether a commit is currently in the priority queue, and skip it if already present. Introduce prio_queue_put_dedup() and prio_queue_get_dedup() wrappers that manage the ENQUEUED flag on enqueue and dequeue. This change is performance-neutral on its own: the O(n) queue_has_nonstale() scan still dominates the per-iteration cost. However, the deduplication guarantee (each commit appears in the queue at most once) is a prerequisite for the next commit, which replaces that scan with O(1) tracking. Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-26 07:17:26 +09:00
Kristofer Karlsson	b80b462d7c	commit-reach: use object flags for tips_reachable_from_bases() tips_reachable_from_bases() walks the commit graph from a set of base commits to find which tip commits are reachable. The inner loop does a linear scan over the tips array to check whether each visited commit is a tip, making the overall cost O(C * T) where C is commits walked and T is the number of tips. Use the RESULT object flag to mark tip commits, replacing the linear scan with a single flag test per visited commit. This reduces the per-commit tip check from O(T) to O(1) and the overall cost from O(C * T) to O(C + T). When multiple refs point to the same commit, the shared object gets the flag once, so all duplicates are handled automatically. The early-termination advancement loop checks the flag on the sorted commits array directly, which naturally handles duplicates since the flag is on the shared commit object. This also removes the index field from struct commit_and_index, since the indirection through the original tips array is no longer needed. This function is called by `git for-each-ref --merged` and `git branch/tag --contains/--no-contains` via reach_filter() in ref-filter.c. Benchmark on a merge-heavy monorepo (2.3M commits, 10,000 refs): Command Before After Speedup for-each-ref --merged HEAD 6.57s 1.59s 4.1x for-each-ref --no-merged HEAD 6.67s 1.66s 4.0x branch --merged HEAD 0.68s 0.61s 10% branch --no-merged HEAD 0.65s 0.61s 8% tag --merged HEAD 0.12s 0.12s - On linux.git with 10,000 synthetic branches at the root commit (worst case for the DFS walk): Command Before After Speedup for-each-ref --merged HEAD 1.35s 0.35s 3.9x for-each-ref --no-merged HEAD 1.82s 0.31s 5.9x The large speedup for for-each-ref is because it checks all 10,000 refs as tips, making the O(T) inner loop expensive. The branch subcommand only checks local branches (fewer tips), so the improvement is smaller. Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-17 14:59:16 +09:00
Kristofer Karlsson	93e5b1680e	commit-reach: early exit paint_down_to_common for single merge-base Commits not in the commit-graph get GENERATION_NUMBER_INFINITY and sort to the top of the priority queue. After those, commits with finite generation numbers are popped in non-increasing order. When MERGE_BASE_FIND_ALL is not set the first doubly-painted commit with a finite generation is therefore a best merge-base: no commit still in the queue can be a descendant of it. Skip the expensive STALE drain in this case. Add MERGE_BASE_FIND_ALL to the merge_base_flags enum. Callers that need every merge-base (repo_get_merge_bases_many, repo_get_merge_bases, repo_in_merge_bases_many, remove_redundant_no_gen) pass the flag to preserve existing behavior. git merge-base (without --all) passes 0, triggering the early exit. On a 2.2M-commit merge-heavy monorepo with commit-graph: HEAD vs ~500: 5,229ms -> 24ms HEAD vs ~1000: 4,214ms -> 39ms HEAD vs ~5000: 3,799ms -> 46ms HEAD vs ~10000: 3,827ms -> 61ms Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-12 09:33:43 +09:00
Kristofer Karlsson	53f9561055	commit-reach: introduce merge_base_flags enum Replace the boolean ignore_missing_commits parameter in paint_down_to_common() with an enum merge_base_flags, and thread the flags through merge_bases_many(), get_merge_bases_many_0(), and the public repo_get_merge_bases_many_dirty() API. This makes callsites with boolean parameters easier to read and prepares the function for additional flags in a subsequent commit. No functional change: the single caller that used ignore_missing_commits (repo_in_merge_bases_many) now sets MERGE_BASE_IGNORE_MISSING_COMMITS in the flags word, and all other callers pass 0. Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-12 09:33:43 +09:00
René Scharfe	60d8b5af9b	commit-reach: simplify cleanup of remaining bitmaps in ahead_behind () Don't bother extracting the last few remaining prio_queue items in order when we only want to free their associated bitmaps; just iterate over the item array. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-19 10:45:41 -07:00
Patrick Steinhardt	9f18d089c5	commit: rename `free_commit_list()` to conform to coding guidelines Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `free_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-15 05:32:31 -08:00
René Scharfe	0e445956f4	commit-reach: use commit_stack Use commit_stack instead of open-coding it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-25 08:29:29 +09:00
René Scharfe	134ec330d2	commit-reach: avoid commit_list_insert_by_date() Building a list using commit_list_insert_by_date() has quadratic worst case complexity. Avoid it by just appending in the loop and sorting at the end. The number of merge bases is usually small, so don't expect speedups in normal repositories. It has no limit, though. The added perf test shows a nice improvement when dealing with 16384 merge bases: Test v2.51.1 HEAD ----------------------------------------------------------------- 6010.2: git merge-base 0.55(0.54+0.00) 0.03(0.02+0.00) -94.5% Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-10-24 10:13:17 -07:00
Patrick Steinhardt	5e7fe8a7b8	commit-reach: use `size_t` to track indices when computing merge bases The functions `repo_get_merge_bases_many()` and friends accepts an array of commits as well as a parameter that indicates how large that array is. This parameter is using a signed integer, which leads to a couple of warnings with -Wsign-compare. Refactor the code to use `size_t` to track indices instead and adapt callers accordingly. While most callers are trivial, there are two callers that require a bit more scrutiny: - builtin/merge-base.c:show_merge_base() subtracts `1` from the `rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if the variable was `0` it would wrap. This code is fine though because its only caller will execute that code only when `argc >= 2`, and it follows that `rev_nr >= 2`, as well. - bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`. Again, there is only a single caller that populates `rev_nr` with `good_revs.nr`. And because a bisection always requires at least one good revision it follws that `rev_nr >= 1`. Mark the file as -Wsign-compare-clean. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:12:40 -08:00
Patrick Steinhardt	85ee0680e2	commit-reach: use `size_t` to track indices in `get_reachable_subset()` Similar as with the preceding commit, adapt `get_reachable_subset()` so that it tracks array indices via `size_t` instead of using signed integers to fix a couple of -Wsign-compare warnings. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:45 -08:00
Patrick Steinhardt	45843d8f4e	commit-reach: use `size_t` to track indices in `remove_redundant()` The function `remove_redundant()` gets as input an array of commits as well as the size of that array and then drops redundant commits from that array. It then returns either `-1` in case an error occurred, or the new number of items in the array. The function receives and returns these sizes with a signed integer, which causes several warnings with -Wsign-compare. Fix this issue by consistently using `size_t` to track array indices and splitting up the returned value into a returned error code and a separate out pointer for the new computed size. Note that `get_merge_bases_many()` and related functions still track array sizes as a signed integer. This will be fixed in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:45 -08:00
Patrick Steinhardt	04aeeeaab1	commit-reach: fix type of `min_commit_date` The `can_all_from_reach_with_flag()` function accepts a parameter that allows callers to cut off traversal at a specified commit date. This parameter is of type `time_t`, which is a signed type, while we end up comparing it to a commit's `date` field, which is of the unsigned type `timestamp_t`. Fix the parameter to be of type `timestamp_t`. There is only a single caller in "upload-pack.c" that sets this parameter, and that caller knows to pass in a `timestamp_t` already. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:45 -08:00
Patrick Steinhardt	95c09e4d07	commit-reach: fix index used to loop through unsigned integer In `62e745ced2` (prio-queue: use size_t rather than int for size, 2024-12-20), we refactored `struct prio_queue` to track the number of contained entries via a `size_t`. While the refactoring adapted one of the users of that variable, it forgot to also adapt "commit-reach.c" accordingly. This was missed because that file has -Wsign-conversion disabled. Fix the issue by using a `size_t` to iterate through entries. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-27 08:11:15 -08:00
Patrick Steinhardt	41f43b8243	global: mark code units that generate warnings with `-Wsign-compare` Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-06 20:20:02 +09:00
Junio C Hamano	3222718ad7	Merge branch 'ds/for-each-ref-is-base' 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. * ds/for-each-ref-is-base: p1500: add is-base performance tests for-each-ref: add 'is-base' token commit: add gentle reference lookup method commit-reach: add get_branch_base_for_tip	2024-08-26 11:32:24 -07:00
Derrick Stolee	e32eaf73b0	commit-reach: add get_branch_base_for_tip Add a new reachability algorithm that intends to discover (from a heuristic) which branch was used as the starting point for a given commit. Add focused tests using the 'test-tool reach' command. In repositories that use pull requests (or merge requests) to advance one or more "protected" branches, the history of that reference can be recovered by following the first-parent history in most cases. Most are completed using no-fast-forward merges, though squash merges are quite common. Less common is rebase-and-merge, which still validates this assumption. Finally, the case that breaks this assumption is the fast-forward update (with potential rebasing). Even in this case, the previous commit commonly appears in the first-parent history of the branch. Similar assumptions can be made for a topic branch created by a single user with the intention to merge back into another branch. Using 'git commit', 'git merge', and 'git cherry-pick' from HEAD will default to having the first-parent commit be the previous commit at HEAD. This history changes only with commands such as 'git reset' or 'git rebase', where the command names also imply that the branch is starting from a new location. With this movement of branches in mind, the following heuristic is proposed as a way to determine the base branch for a given source branch: Among a list of candidate base branches, select the candidate that minimizes the number of commits in the first-parent history of the source that are not in the first-parent history of the candidate. Prior third-party solutions to this problem have used this optimization criteria, but have relied upon extracting the first-parent history and comparing those lists as tables instead of using commit-graph walks. Given current command-line interface options, this optimization criteria is not easy to detect directly. Even using the command git rev-list --count --first-parent <base>..<source> does not measure this count, as it uses full reachability from <base> to determine which commits to remove from the range '<base>..<source>'. This may lead to one asking if we should instead be using the full reachability of the candidate and only the first-parent history of the source. This, unfortunately, does not work for repositories that use long-lived branches and automation to merge across those branches. In extremely large repositories, merging into a single trunk may not be feasible. This is usually due to the desired frequency of updates (thousands of engineers doing daily work) combined with the time required to perform a validation build. These factors combine to create significant risk of semantic merge conflicts, leading to build breaks on the trunk. In response, repository maintainers can create a single Level Zero (L0) trunk and multiple Level One (L1) branches. By partitioning the engineers by organization, these engineers may see lower risk of semantic merge conflicts as well as be protected against build breaks in other L1 branches. The key to making this system work is a semi-automated process of merging L1 branches into the L0 trunk and vice-versa. In a large enough organization, these L1 branches may further split into L2 or L3 branches, but the same principles apply for merging across deeper levels. If these automated merges use a typical merge with the second parent bringing in the "new" content, then each L0 and L1 branch can track its previous positions by following first-parent history, which appear as parallel paths (until reaching the first place where the branches diverged). If we also walk to second parents, then the histories overlap significantly and cannot be distinguished except for very-recent changes. For this reason, the first-parent condition should be symmetrical across the base and source branches. Another common case for desiring the result of this optimization method is the use of release branches. When releasing a version of a repository, a branch can be used to track that release. Any updates that are worth fixing in that release can be merged to the release branch and shipped with only the necessary fixes without any new features introduced in the trunk branch. The 'maint-2.<X>' branches represent this pattern in the Git project. The microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This application doesn't need the symmetrical first-parent condition, but the use of first-parent histories does not change the results for these branches. To determine the base branch from a list of candidates, create a new method in commit-reach.c that performs a single* commit-graph walk. The core concept is to walk first-parents starting at the candidate bases and the source, tracking the "best" base to reach a given commit. Use generation numbers to ensure that a commit is walked at most once and all children have been explored before visiting it. When reaching a commit that is reachable from both a base and the source, we will then have a guarantee that this is the closest intersection of first-parent histories. Track the best base to reach that commit and return it as a result. In rare cases involving multiple root commits, the first-parent history of the source may never intersect any of the candidates and thus a null result is returned. * There are up to two walks, since we require all commits to have a computed generation number in order to avoid incorrect results. This is similar to the need for computed generation numbers in ahead_behind() as implemented in `fd67d149bd` (commit-reach: implement ahead_behind() logic, 2023-03-20). In order to track the "best" base, use a new commit slab that stores an integer. This value defaults to zero upon initialization, so use -1 to track that the source commit can reach this commit and use 'i + 1' to track that the ith base can reach this commit. When multiple bases can reach a commit, minimize the index to break ties. This allows the caller to specify an order to the bases that determines some amount of preference when the heuristic does not result in a unique result. The trickiest part of the integer slab is what happens when reaching a collision among the histories of the bases and the history of the source. This is noticed when viewing the first parent and seeing that it has a slab value that differs in sign (negative or positive). In this case, the collision commit is stored in the method variable 'branch_point' and its slab value is set to -1. The index of the best base (so far) is stored in the method variable 'best_index'. It is possible that there are multiple commits that have the branch_point as its first parent, leading to multiple updates of best_index. The result is determined when 'branch_point' is visited in the commit walk, giving the guarantee that all commits that could reach 'branch_point' were visited. Several interesting cases of collisions and different results are tested in the t6600-test-reach.sh script. Recall that this script also tests the algorithm in three possible states involving the commit-graph file and how many commits are written in the file. This provides some coverage of the need (and lack of need) for the ensure_generations_valid() method. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-14 10:10:05 -07:00
Patrick Steinhardt	f30bfafcd4	commit-reach: fix trivial memory leak when computing reachability We don't free the local `stack` commit list that we use to compute reachability of multiple commits at once. Do so. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-08-01 08:47:38 -07:00
Patrick Steinhardt	e7da938570	global: introduce `USE_THE_REPOSITORY_VARIABLE` macro Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-14 10:26:33 -07:00
Patrick Steinhardt	ba9d029445	commit-reach: fix memory leak in `ahead_behind()` We use a priority queue in `ahead_behind()` to compute the ahead/behind count for commits. We may not iterate through all commits part of that queue though in case all of its entries are stale. Consequently, as we never make the effort to release the remaining commits, we end up leaking bit arrays that we have allocated for each of the contained commits. Plug this leak and mark the corresponding test as leak free. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-27 11:20:01 -07:00
Johannes Schindelin	caaf1a2942	commit-reach(repo_get_merge_bases_many_dirty): pass on errors (Actually, this commit is only about passing on "missing commits" errors, but adding that to the commit's title would have made it too long.) The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases_many_dirty()` function is aware of that, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	5317380521	commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases_many()` function is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next stop: `repo_get_merge_bases_dirty()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	f87056ce40	commit-reach(get_octopus_merge_bases): pass on "missing commits" errors The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `get_merge_bases()` macro) is aware of that, too. Naturally, the callers need to be adjusted now, too. Next step: adjust `repo_get_merge_bases_many()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	76e2a09999	commit-reach(repo_get_merge_bases): pass on "missing commits" errors The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `repo_get_merge_bases()` function (which is also surfaced via the `repo_get_merge_bases()` macro) is aware of that, too. Naturally, there are a lot of callers that need to be adjusted now, too. Next step: adjust the callers of `get_octopus_merge_bases()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	8226e157a9	commit-reach(get_merge_bases_many_0): pass on "missing commits" errors The `merge_bases_many()` function was just taught to indicate parsing errors, and now the `get_merge_bases_many_0()` function is aware of that, too. Next step: adjust the callers of `get_merge_bases_many_0()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	fb02c523a3	commit-reach(merge_bases_many): pass on "missing commits" errors The `paint_down_to_common()` function was just taught to indicate parsing errors, and now the `merge_bases_many()` function is aware of that, too. One tricky aspect is that `merge_bases_many()` parses commits of its own, but wants to gracefully handle the scenario where NULL is passed as a merge head, returning the empty list of merge bases. The way this was handled involved calling `repo_parse_commit(NULL)` and relying on it to return an error. This has to be done differently now so that we can handle missing commits correctly by producing a fatal error. Next step: adjust the caller of `merge_bases_many()`: `get_merge_bases_many_0()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	896a0e11f3	commit-reach(paint_down_to_common): start reporting errors If a commit cannot be parsed, it is currently ignored when looking for merge bases. That's undesirable as the operation can pretend success in a corrupt repository, even though the command should fail with an error message. Let's start at the bottom of the stack by teaching the `paint_down_to_common()` function to return an `int`: if negative, it indicates fatal error, if 0 success. This requires a couple of callers to be adjusted accordingly. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:06:01 -08:00
Johannes Schindelin	2d2da172f3	commit-reach(paint_down_to_common): prepare for handling shallow commits When `git fetch --update-shallow` needs to test for commit ancestry, it can naturally run into a missing object (e.g. if it is a parent of a shallow commit). For the purpose of `--update-shallow`, this needs to be treated as if the child commit did not even have that parent, i.e. the commit history needs to be clamped. For all other scenarios, clamping the commit history is actually a bug, as it would hide repository corruption (for an analysis regarding shallow and partial clones, see the analysis further down). Add a flag to optionally ask the function to ignore missing commits, as `--update-shallow` needs it to, while detecting missing objects as a repository corruption error by default. This flag is needed, and cannot be replaced by `is_repository_shallow()` to indicate that situation, because that function would return 0 in the `--update-shallow` scenario: There is not actually a `shallow` file in that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root with receive.updateshallow on") and t5538.4 ("add new shallow root with receive.updateshallow on"). Note: shallow commits' parents are set to `NULL` internally already, therefore there is no need to special-case shallow repositories here, as the merge-base logic will not try to access parent commits of shallow commits. Likewise, partial clones aren't an issue either: If a commit is missing during the revision walk in the merge-base logic, it is fetched via `promisor_remote_get_direct()`. And not only the single missing commit object: Due to the way the "promised" objects are fetched (in `fetch_objects()` in `promisor-remote.c`, using `fetch --filter=blob:none`), there is no actual way to fetch a single commit object, as the remote side will pass that commit OID to `pack-objects --revs [...]` which in turn passes it to `rev-list` which interprets this as a commit _range_ instead of a single object. Therefore, in partial clones (unless they are shallow in addition), all commits reachable from a commit that is in the local object database are also present in that local database. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:05:45 -08:00
Johannes Schindelin	24876ebf68	commit-reach(repo_in_merge_bases_many): report missing commits Some functions in Git's source code follow the convention that returning a negative value indicates a fatal error, e.g. repository corruption. Let's use this convention in `repo_in_merge_bases()` to report when one of the specified commits is missing (i.e. when `repo_parse_commit()` reports an error). Also adjust the callers of `repo_in_merge_bases()` to handle such negative return values. Note: As of this patch, errors are returned only if any of the specified merge heads is missing. Over the course of the next patches, missing commits will also be reported by the `paint_down_to_common()` function, which is called by `repo_in_merge_bases_many()`, and those errors will be properly propagated back to the caller at that stage. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 09:47:03 -08:00
Johannes Schindelin	207c40e1e4	commit-reach(repo_in_merge_bases_many): optionally expect missing commits Currently this function treats unrelated commit histories the same way as commit histories with missing commit objects. Typically, missing commit objects constitute a corrupt repository, though, and should be reported as such. The next commits will make it so, but there is one exception: In `git fetch --update-shallow` we _expect_ commit objects to be missing, and we do want to treat the now-incomplete commit histories as unrelated. To allow for that, let's introduce an additional parameter that is passed to `repo_in_merge_bases_many()` to trigger this behavior, and use it in the two callers in `shallow.c`. This commit changes behavior slightly: unless called from the `shallow.c` functions that set the `ignore_missing_commits` bit, any non-existing tip commit that is passed to `repo_in_merge_bases_many()` will now result in an error. Note: When encountering missing commits while traversing the commit history in search for merge bases, with this commit there won't be a change in behavior just yet, their children will still be interpreted as root commits. This bug will get fixed by follow-up commits. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 09:47:03 -08:00
Johannes Schindelin	e67431d496	commit-reach(paint_down_to_common): plug two memory leaks When a commit is missing, we return early (currently pretending that no merge basis could be found in that case). At that stage, it is possible that a merge base could have been found already, and added to the `result`, which is now leaked. The priority queue has a similar issue: There might still be a commit in that queue. Let's release both, to address the potential memory leaks. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 09:47:03 -08:00
Elijah Newren	eea0e59ffb	treewide: remove unnecessary includes in source files Each of these were checked with gcc -E -I. ${SOURCE_FILE} \| grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:31 -08:00
Jeff King	ec97ad120c	commit-reach: free temporary list in get_octopus_merge_bases() We loop over the set of commits to merge, and for each one compute the merge base against the existing set of merge base candidates we've found. Then we replace the candidate set with a simple assignment of the list head, leaking the old list. We should free it first before assignment. This makes t5521 leak-free, so mark it as such. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:23 -07:00
Calvin Wan	91c080dff5	git-compat-util: move alloc macros to git-compat-util.h alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-05 11:42:31 -07:00
Junio C Hamano	693bde461c	Merge branch 'mh/commit-reach-get-reachable-plug-leak' Plug memory leak. * mh/commit-reach-get-reachable-plug-leak: commit-reach: fix memory leak in get_reachable_subset()	2023-06-20 15:53:11 -07:00
Mike Hommey	68b51172e3	commit-reach: fix memory leak in get_reachable_subset() This is a leak that has existed since the method was first created in `fcb2c0769d` (commit-reach: implement get_reachable_subset, 2018-11-02). Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-04 13:43:48 +09:00
Junio C Hamano	72871b198f	Merge branch 'ab/remove-implicit-use-of-the-repository' Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-06 13:38:30 -07:00
Junio C Hamano	7727da99df	Merge branch 'ds/ahead-behind' "git for-each-ref" learns '%(ahead-behind:<base>)' that computes the distances from a single reference point in the history with bunch of commits in bulk. * ds/ahead-behind: commit-reach: add tips_reachable_from_bases() for-each-ref: add ahead-behind format atom commit-reach: implement ahead_behind() logic commit-graph: introduce `ensure_generations_valid()` commit-graph: return generation from memory commit-graph: simplify compute_generation_numbers() commit-graph: refactor compute_topological_levels() for-each-ref: explicitly test no matches for-each-ref: add --stdin option	2023-04-06 13:38:21 -07:00
Ævar Arnfjörð Bjarmason	4a93b899c1	libs: use "struct repository " argument, not "the_repository" As can easily be seen from grepping in our sources, we had these uses of "the_repository" in various library code in cases where the function in question was already getting a "struct repository " argument. Let's use that argument instead. Out of these changes only the changes to "cache-tree.c", "commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly applied before the migration away from the "repo_*()" wrapper macros in the preceding commits. The rest aren't new, as we'd previously implicitly refer to "the_repository", but it's now more obvious that we were doing the wrong thing all along, and should have used the parameter instead. The change to change "get_index_format_default(the_repository)" in "read-cache.c" to use the "r" variable instead should arguably have been part of [1], or in the subsequent cleanup in [2]. Let's do it here, as can be seen from the initial code in [3] it's not important that we use "the_repository" there, but would prefer to always use the current repository. This change excludes the "the_repository" use in "upload-pack.c"'s upload_pack_advertise(), as the in-flight [4] makes that change. 1. `ee1f0c242e` (read-cache: add index.skipHash config option, 2023-01-06) 2. `6269f8eaad` (treewide: always have a valid "index_state.repo" member, 2023-01-17) 3. `7211b9e753` (repo-settings: consolidate some config settings, 2019-08-13) 4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason	ecb5091fd4	cocci: apply the "commit.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "commit.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:45 -07:00

1 2 3

105 Commits