"git log --follow" has been updated to handle non-linear history, in
which the path being tracked gets renamed differently in multiple
history lines, better.
* mv/log-follow-mergy:
log: improve --follow following renames for non-linear history
The lazy priority queue optimization pattern (deferring actual removal
in prio_queue_get() to allow get+put fusion) has been folded directly
into prio_queue itself, speeding up commit traversal workflows and
simplifying callers.
* kk/prio-queue-get-put-fusion:
prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
prio-queue: rename .nr to .nr_ and add accessor helpers
Streaming revision walks have been optimized by using a priority queue
for date-sorting commits, speeding up walks repositories with many
merges.
* kk/streaming-walk-pqueue:
revision: use priority queue for non-limited streaming walks
revision: introduce rev_walk_mode to clarify get_revision_1()
pack-objects: call release_revisions() after cruft traversal
"git rev-list" (and "git log" family of commands) learned a new "--max-count-oldest"
that picks oldest N commits in the range instead of the usual newest.
* mf/revision-max-count-oldest:
bash-completions: add --max-count-oldest
revision.c: implement --max-count-oldest
Many core configuration variables have been migrated from global
variables into 'repo_config_values' to tie them to a specific
repository instance, avoiding cross-repository state leakage.
* ob/more-repo-config-values:
environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values`
environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values`
environment: move "core_sparse_checkout_cone" into `struct repo_config_values`
environment: move "precomposed_unicode" into `struct repo_config_values`
environment: move "pack_compression_level" into `struct repo_config_values`
environment: move `zlib_compression_level` into `struct repo_config_values`
environment: move "check_stat" into `struct repo_config_values`
environment: move "trust_ctime" into `struct repo_config_values`
Have a repo with a subtree merge, do a 'git log --follow prefix/test.c',
the output only contains history in the outer repo, not commits that
were merged via a subtree merge.
What happens is that 'git log --follow' stores the followed path only in
opt->diffopt.pathspec, so in case the commit history is non-linear, and
multiple parents have renames to the followed path, then the end result
isn't really defined: the first commit that happens to be visited in one
of the parents update opt->diffopt.pathspec, and from that point, only
that updated path is visited.
Fix the problem by introducing a commit -> path map
(follow_pathspec_slab) that stores what will be a path to follow when
visiting that parent. At the top of log_tree_commit(), if the slab has
an entry for this commit, we replace opt->diffopt.pathspec with a path
from this entry, so the correct path is followed, even if an unrelated
sub-tree changed the path to be followed to something else. After
log_tree_diff() runs, we record each parent's path in the slab. As a
result, the walk order doesn't matter, which was exactly the source of
problems previously.
This helps with subtree merges (rename happens inside the merge commit),
but also fixes the general case when the rename happens in the history
of parents, not in the merge commit itself.
Signed-off-by: Miklos Vajna <vmiklos@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git log -L` implementation has been refactored to use the
standard diff output pipeline, enabling pickaxe and diff-filter to
work as expected. Additionally, metadata-only diff formats like
--raw and --name-only are now supported with -L.
* mm/line-log-cleanup:
line-log: allow non-patch diff formats with -L
line-log: integrate -L output with the standard log-tree pipeline
revision: move -L setup before output_format-to-diff derivation
Rename the .nr member to .nr_ so that callers outside prio-queue.c
that directly reference .nr get a compilation error. This catches
both existing misuse and future in-flight topics.
Add prio_queue_size() for callers that need to know the element count
and prio_queue_for_each() for callers that need to walk all elements.
Convert all external .nr users:
- Loop conditions: use prio_queue_size(), prio_queue_get(), or
prio_queue_peek() as the loop condition
- Array iterations: use prio_queue_for_each()
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"--max-count" is a commit limiting option and sets a maximum amount
of commits to be shown. If a user wants to see only the first N
commits of the history (the oldest commits) they'd have to do
something like
git log $(git rev-list HEAD | tail -n N | head -n 1)
This is not very user-friendly.
Teach get_revision() the --max-count-oldest option.
Signed-off-by: Mirko Faina <mroik@delayed.space>
[jc: fixed up t4202 <xmqq7boy4o05.fsf@gitster.g>]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `core.warnAmbiguousRefs` configuration was previously stored in a
global `int` variable, making it shared across repository instances
and risking cross‑repository state leakage.
Store it instead in `repo_config_values`, where eagerly‑parsed
repository configuration lives. This option is parsed eagerly because
ambiguity warnings influence how users interpret object references in
many commands; a lazy parse could cause these warnings to behave
inconsistently or to appear for the wrong repository, confusing users
and hindering libification. This preserves the existing behavior while
tying the value to the repository from which it was read, avoiding
cross‑repository state leakage and continuing the effort to reduce
reliance on global configuration state.
Update all references to use `repo_config_values()`.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The graph output from commands like "git log --graph" can now be
limited to a specified number of lanes, preventing overly wide output
in repositories with many branches.
* ps/graph-lane-limit:
graph: add truncation mark to capped lanes
graph: add --graph-lane-limit option
graph: limit the graph width to a hard-coded max
Now that -L flows through log_tree_diff_flush() and diff_flush(),
metadata-only diff formats work because they only read filepair
fields (status, mode, path, oid) already set on the pre-computed
pairs.
Expand the allowlist in setup_revisions() to also accept --raw,
--name-only, --name-status, and --summary. Diff stat formats
(--stat, --numstat, --shortstat, --dirstat) remain blocked because
they call compute_diffstat() on full blob content and would show
whole-file statistics rather than range-scoped ones.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git log -L` has bypassed log_tree_diff() and log_tree_diff_flush()
since the feature was introduced, short-circuiting from
log_tree_commit() directly into line_log_print(). This skips the
no_free save/restore (noted in a NEEDSWORK comment added by
f8781bfda3), the always_show_header fallback, show_diff_of_diff(),
and diff_free() cleanup.
Restructure so that -L flows through log_tree_diff() ->
log_tree_diff_flush(), the same path used by the normal
single-parent and merge diff codepaths:
- Rename line_log_print() to line_log_queue_pairs() and strip it
down to just queuing pre-computed filepairs. The show_log(),
separator, diffcore_std(), and diff_flush() calls are removed
since log_tree_diff_flush() handles all of those.
- In log_tree_diff(), call line_log_queue_pairs() then
log_tree_diff_flush(), mirroring the diff_tree_oid() + flush
pattern used by the single-parent and merge codepaths.
- Remove the early return in log_tree_commit() that is no longer
needed now that -L output flows through log_tree_diff() and
log_tree_diff_flush(); this restores no_free save/restore,
always_show_header, and diff_free() cleanup.
Because show_log() is now deferred until after diffcore_std() inside
log_tree_diff_flush(), pickaxe (-S, -G, --find-object) and
--diff-filter now properly suppress commits when all pairs are
filtered out.
The blank-line separator between commit header and diff changes
slightly: the old code printed one unconditionally, while
log_tree_diff_flush() only emits one for verbose headers. This
matches the rest of log output.
Also reject --full-diff, which is not yet supported with -L: the
filepairs are pre-computed during the history walk and scoped to
tracked line ranges, so there is currently no full-tree diff to
fall back to for display.
Update tests accordingly.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The line_level_traverse block sets a default DIFF_FORMAT_PATCH when
no output format has been explicitly requested. This default must
be visible to the "Did the user ask for any diff output?" check
that derives revs->diff from revs->diffopt.output_format.
Currently the -L block runs after that derivation, so revs->diff
stays 0 when no explicit format is given. This does not matter yet
because log_tree_commit() short-circuits into line_log_print()
before consulting revs->diff, but the next commit will route -L
through the normal log_tree_diff() path, which checks revs->diff.
Move the block above the derivation so the default DIFF_FORMAT_PATCH
is in place when revs->diff is computed. No behavior change on its
own.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The streaming (non-limited) walk in get_revision_1() inserts newly
discovered parent commits into a date-sorted queue via
commit_list_insert_by_date(), which scans the linked list to find the
insertion point -- O(w) per insert, where w is the width of the active
walk frontier. Replace this with an O(log w) priority queue.
Add a commit_queue field to rev_info alongside the existing commits
linked list. The two representations are mutually exclusive: setup
and external callers that need list access use the linked list, then
get_revision_1() lazily drains it into the priority queue on first
call. Add a REV_WALK_NO_WALK enum value to distinguish the no_walk
case (which still uses the commit list) from the streaming case.
The conversion function rev_info_commit_list_to_queue() is public so
callers that know they will iterate can convert early.
Combined with the limit_list() priority queue change already in
master, this eliminates all O(w) sorted linked-list insertion from
the revision walk machinery.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_revision_1() dispatches to different walk strategies based on a
combination of rev_info flags: reflog_info, topo_walk_info, and
limited. These conditions are checked in multiple places within
the function -- once to select the next commit, and again to decide
how to expand parents -- and the two chains must stay in sync.
Extract the mode selection into a rev_walk_mode enum and a small
get_walk_mode() helper, resolved once at the top of get_revision_1().
Both dispatch sites now switch on the same mode variable, making it
obvious that they agree and easier to verify that all modes are
handled.
No functional change.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many uses of the_repository has been updated to use a more
appropriate struct repository instance in setup.c codepath.
* ps/setup-wo-the-repository:
setup: stop using `the_repository` in `init_db()`
setup: stop using `the_repository` in `create_reference_database()`
setup: stop using `the_repository` in `initialize_repository_version()`
setup: stop using `the_repository` in `check_repository_format()`
setup: stop using `the_repository` in `upgrade_repository_format()`
setup: stop using `the_repository` in `setup_git_directory()`
setup: stop using `the_repository` in `setup_git_directory_gently()`
setup: stop using `the_repository` in `setup_git_env()`
setup: stop using `the_repository` in `set_git_work_tree()`
setup: stop using `the_repository` in `setup_work_tree()`
setup: stop using `the_repository` in `enter_repo()`
setup: stop using `the_repository` in `verify_non_filename()`
setup: stop using `the_repository` in `verify_filename()`
setup: stop using `the_repository` in `path_inside_repo()`
setup: stop using `the_repository` in `prefix_path()`
setup: stop using `the_repository` in `is_inside_work_tree()`
setup: stop using `the_repository` in `is_inside_git_dir()`
setup: replace use of `the_repository` in static functions
Stop using `the_repository` in `verify_non_filename()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `verify_filename()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
limit_list() maintains a date-sorted work queue of commits using a
linked list with commit_list_insert_by_date() for insertion. Each
insertion walks the list to find the right position — O(n) per insert.
In repositories with merge-heavy histories, the symmetric difference
can contain thousands of commits, making this O(n) insertion the
dominant cost.
Replace the sorted linked list with a prio_queue (binary heap). This
gives O(log n) insertion and O(log n) extraction instead of O(n)
insertion and O(1) extraction, which is a net win when the queue is
large.
The still_interesting() and everybody_uninteresting() helpers are
updated to scan the prio_queue's contiguous array instead of walking a
linked list. process_parents() already accepts both a commit_list and
a prio_queue parameter, so the change in limit_list() simply switches
which one is passed.
Benchmark: git rev-list --left-right --count HEAD~N...HEAD
Repository: 2.3M commits, merge-heavy DAG (monorepo)
Best of 5 runs, times in seconds:
commits in
symmetric diff baseline patched speedup
-------------- -------- ------- -------
10 0.01 0.01 1.0x
50 0.01 0.01 1.0x
3751 21.23 8.49 2.5x
4524 21.70 8.29 2.6x
10130 20.10 6.65 3.0x
No change for small traversals; 2.5-3.0x faster when the queue grows
to thousands of commits.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The way the "git log -L<range>:<file>" feature is bolted onto the
log/diff machinery is being reworked a bit to make the feature
compatible with more diff options, like -S/G.
* mm/line-log-use-standard-diff-output:
doc: note that -L supports patch formatting and pickaxe options
t4211: add tests for -L with standard diff options
line-log: route -L output through the standard diff pipeline
line-log: fix crash when combined with pickaxe options
Replace the hard-coded lane limit with a user-facing
option '--graph-lane-limit=<n>'. It caps the number of
visible lanes to n. This option requires '--graph', without
it, limiting the graph has no meaning, in this case error out.
Zero and negative values are valid inputs but silently
ignored treating them as "no limit", the same as not using
the option. This follows what '--max-parents' does with
negative values.
The default is 0, same as not being used.
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We take in a "const char *", but may write a NUL into it when parsing
parent marks like "foo^-", since we want to isolate "foo" as a string
for further parsing. This is usually OK, as our "const" strings are
often actually argv strings which are technically writeable, but we'd
segfault with a string literal like:
handle_revision_arg("HEAD^-", &revs, 0, 0);
Similar to how we handled dotdot in a previous commit, we can avoid this
by making a temporary copy of the left-hand side of the string. The cost
should negligible compared to the rest of the parsing (like actually
parsing commits to create their parent linked-lists).
There is one slightly tricky thing, though. We parse some of the marks
progressively, so that if we see "foo^!" for example, we'll strip that
down to "foo" not just for calling add_parents_only(), but also for the
rest of the function. That makes sense since we eventually want to pass
"foo" to get_oid_with_context(). But it also means that we'll keep
looking for other marks. In particular, "foo^-^!" is valid, though oddly
"foo^!^-" would ignore the "^-". I'm not sure if this is a weird
historical artifact of the implementation, or if there are important
corner cases.
So I've left the behavior unchanged. Each mark we find allocates a
string with the mark stripped, which means we could allocate multiple
times (and carry a free-able pointer for each to the end). But in
practice we won't, because of the three marks, "^@" jumps immediately to
the end without further parsing, and "^-^!" is nonsense that nobody
would pass. So you'd get one allocation in general, and never more than
two.
Another obvious option would be to just copy "arg" up front and be OK
with munging it. But that means we pay the cost even when we find no
marks. We could make a single copy upon finding a mark and then munge,
but that adds extra code to each site (checking whether somebody else
allocated, and if not, adjusting our "mark" pointer to be relative to
the copied string).
I aimed for something that was clear and obvious, if a bit verbose.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two very subtle bits to the way we parse ".." (and "...")
range operators:
1. In handle_dotdot_1(), we assume that the incoming arguments "dotdot"
and "arg" are part of the same string, with the first digit of the
range-operator blanked to a NUL. Then when we want the full name
(e.g., to report an error), we replace the NUL with a dot to restore
the original string.
2. In handle_dotdot(), we take in a const string, but then we modify it
by overwriting the range operator with a NUL. This has worked OK in
practice since we tend to pass in buffers that are actually
writeable (including argv), but segfaults with something like:
handle_revision_arg("..HEAD", &revs, 0, 0);
On top of that, building with recent versions of glibc causes the
compiler to complain, because it notices when we use strchr() or
strstr() to launder away constness (basically detecting the
possibility of the segfault above via the type system).
Instead of munging the buffer, let's instead make a temporary copy of
the left-hand side of the range operator. That avoids any const
violations, and lets us pass around the parsed elements independently:
the left-hand side, the right-hand side, the number of dots (via the
"symmetric" flag), and the original full string for error messages.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git log -L` has always bypassed the standard diff pipeline.
`dump_diff_hacky()` in line-log.c hand-rolls its own diff headers and
hunk output, which means most diff formatting options are silently
ignored. A NEEDSWORK comment has acknowledged this since the feature
was introduced:
/*
* NEEDSWORK: manually building a diff here is not the Right
* Thing(tm). log -L should be built into the diff pipeline.
*/
Remove `dump_diff_hacky()` and its helpers and route -L output through
`builtin_diff()` / `fn_out_consume()`, the same path used by `git diff`
and `git log -p`. The mechanism is a pair of callback wrappers that sit
between `xdi_diff_outf()` and `fn_out_consume()`, filtering xdiff's
output to only the tracked line ranges. To ensure xdiff emits all lines
within each range as context, the context length is inflated to span the
largest range.
Wire up the `-L` implies `--patch` default in revision setup rather
than forcing it at output time, so `line_log_print()` is just
`diffcore_std()` + `diff_flush()` with no format save/restore.
Rename detection is a no-op since pairs are already resolved during
the history walk in `queue_diffs()`, but running `diffcore_std()`
means `-S`/`-G` (pickaxe), `--orderfile`, and `--diff-filter` now
work with `-L`, and `diff_resolve_rename_copy()` sets pair statuses
correctly without manual assignment.
Switch `diff_filepair_dup()` from `xmalloc` to `xcalloc` so that new
fields (including `line_ranges`) are zero-initialized by default.
As a result, diff formatting options that were previously silently
ignored (e.g. --word-diff, --no-prefix, -w, --color-moved) now work
with -L, and output gains `index` lines, `new file mode` headers, and
funcname context in `@@` headers. This is a user-visible output change:
tools that parse -L output may need to handle the additional header
lines.
The context-length inflation means xdiff may process more output than
needed for very wide line ranges, but benchmarks on files up to 7800
lines show no measurable regression.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
API clean-up for the worktree subsystem.
* pw/no-more-NULL-means-current-worktree:
path: remove repository argument from worktree_git_path()
wt-status: avoid passing NULL worktree
Revamp object enumeration API around odb.
* ps/odb-for-each-object:
odb: drop unused `for_each_{loose,packed}_object()` functions
reachable: convert to use `odb_for_each_object()`
builtin/pack-objects: use `packfile_store_for_each_object()`
odb: introduce mtime fields for object info requests
treewide: drop uses of `for_each_{loose,packed}_object()`
treewide: enumerate promisor objects via `odb_for_each_object()`
builtin/fsck: refactor to use `odb_for_each_object()`
odb: introduce `odb_for_each_object()`
packfile: introduce function to iterate through objects
packfile: extract function to iterate through objects of a store
object-file: introduce function to iterate through objects
object-file: extract function to read object info from path
odb: fix flags parameter to be unsigned
odb: rename `FOR_EACH_OBJECT_*` flags
"git rev-list" and friends learn "--maximal-only" to show only the
commits that are not reachable by other commits.
* ds/revision-maximal-only:
revision: add --maximal-only option
Replace calls to `refs_for_each_fullref_in()` with the newly introduced
`refs_for_each_ref_ext()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace calls to `refs_for_each_glob_ref()` with the newly introduced
`refs_for_each_ref_ext()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace calls to `refs_for_each_glob_ref_in()` with the newly introduced
`refs_for_each_ref_ext()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, rename `each_ref_fn` to better match
our current best practices around how we name things.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
worktree_git_path() takes a struct repository and a struct worktree
which also contains a struct repository. The repository argument
was added by a973f60dc7 (path: stop relying on `the_repository` in
`worktree_git_path()`, 2024-08-13) and exists because the worktree
argument is optional. Having two ways of passing a repository is
a potential foot-gun as if the the worktree argument is present the
repository argument must match the worktree's repository member. Since
the last commit there are no callers that pass a NULL worktree so lets
remove the repository argument. This removes the potential confusion
and lets us delete a number of uses of "the_repository".
worktree_git_path() has the following callers:
- builtin/worktree.c:validate_no_submodules() which is called from
check_clean_worktree() and move_worktree(), both of which supply
a non-NULL worktree.
- builtin/fsck.c:cmd_fsck() which loops over all worktrees.
- revision.c:add_index_objects_to_pending() which loops over all
worktrees.
- worktree.c:worktree_lock_reason() which dereferences wt before
calling worktree_git_path().
- wt-status.c:wt_status_check_bisect() and wt_status_check_rebase()
which are always called with a non-NULL worktree after the last
commit.
- wt-status.c:git_branch() which is only called by
wt_status_check_bisect() and wt_status_check_rebase().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename three functions around the commit_list data structure.
* ps/commit-list-functions-renamed:
commit: rename `free_commit_list()` to conform to coding guidelines
commit: rename `reverse_commit_list()` to conform to coding guidelines
commit: rename `copy_commit_list()` to conform to coding guidelines
We have multiple callsites where we enumerate all promisor objects in
the object database via `for_each_packed_object()`. This is done by
passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to
skip over all non-promisor objects.
These callsites can be trivially converted to `odb_for_each_object()` as
we know to skip enumeration of loose objects in case the `PROMISOR_ONLY`
flag was passed by the caller.
Refactor the sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When inspecting a range of commits from some set of starting references, it
is sometimes useful to learn which commits are not reachable from any other
commits in the selected range.
One such application is in the creation of a sequence of bundles for the
bundle URI feature. Creating a stack of bundles representing different
slices of time includes defining which references to include. If all
references are used, then this may be overwhelming or redundant. Instead,
selecting commits that are maximal to the range could help defining a
smaller reference set to use in the bundle header.
Add a new '--maximal-only' option to restrict the output of a revision range
to be only the commits that are not reachable from any other commit in the
range, based on the reachability definition of the walk.
This is accomplished by adding a new 28th bit flag, CHILD_VISITED, that is
set as we walk. This does extend the bit range in object.h, but using an
earlier bit may collide with another feature.
The tests demonstrate the behavior of the feature with a positive-only
range, ranges with negative references, and walk-modifying flags like
--first-parent and --exclude-first-parent-only.
Since the --boundary option would not increase any results when used with
the --maximal-only option, mark them as incompatible.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove implicit reliance on the_repository global in the APIs
around tree objects and make it explicit which repository to work
in.
* rs/tree-wo-the-repository:
cocci: remove obsolete the_repository rules
cocci: convert parse_tree functions to repo_ variants
tree: stop using the_repository
tree: use repo_parse_tree()
path-walk: use repo_parse_tree_gently()
pack-bitmap-write: use repo_parse_tree()
delta-islands: use repo_parse_tree()
bloom: use repo_parse_tree()
add-interactive: use repo_parse_tree_indirect()
tree: add repo_parse_tree*()
environment: move access to core.maxTreeDepth into repo settings
The packfile_store data structure is moved from object store to odb
source.
* ps/packfile-store-in-odb-source:
packfile: move MIDX into packfile store
packfile: refactor `find_pack_entry()` to work on the packfile store
packfile: inline `find_kept_pack_entry()`
packfile: only prepare owning store in `packfile_store_prepare()`
packfile: only prepare owning store in `packfile_store_get_packs()`
packfile: move packfile store into object source
packfile: refactor misleading code when unusing pack windows
packfile: refactor kept-pack cache to work with packfile stores
packfile: pass source to `prepare_pack()`
packfile: create store via its owning source
* ps/packfile-store-in-odb-source:
packfile: move MIDX into packfile store
packfile: refactor `find_pack_entry()` to work on the packfile store
packfile: inline `find_kept_pack_entry()`
packfile: only prepare owning store in `packfile_store_prepare()`
packfile: only prepare owning store in `packfile_store_get_packs()`
packfile: move packfile store into object source
packfile: refactor misleading code when unusing pack windows
packfile: refactor kept-pack cache to work with packfile stores
packfile: pass source to `prepare_pack()`
packfile: create store via its owning source
odb: properly close sources before freeing them
builtin/gc: fix condition for whether to write commit graphs
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `free_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our coding guidelines say that:
Functions that operate on `struct S` are named `S_<verb>()` and should
generally receive a pointer to `struct S` as first parameter.
While most of the functions related to `struct commit_list` already
follow that naming schema, `copy_commit_list()` doesn't.
Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch to convert calls to parse_tree() and
friends to the corresponding variant that takes a repository argument,
to allow the functions that implicitly use the_repository to be retired
once all potential in-flight topics are settled and converted as well.
The changes in .c files were generated by Coccinelle, but I fixed a
whitespace bug it would have introduced to builtin/commit.c.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.
Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.
Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dynamic arrays of commit pointers are used in several places. Some of
them use a custom struct to hold array, item count and capacity, others
have them as separate variables linked by a common name part.
Pick one succinct, clean implementation -- commit_stack -- and convert
the different variants to it to reduce code duplication.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.
It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.
One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.
Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.
[1]: <ZmarVcF5JjsZx0dl@tanuki>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up around commit-graph.
* ps/commit-graph-per-object-source:
commit-graph: pass graphs that are to be merged as parameter
commit-graph: return commit graph from `repo_find_commit_pos_in_graph()`
commit-graph: return the prepared commit graph from `prepare_commit_graph()`
revision: drop explicit check for commit graph
blame: drop explicit check for commit graph
There are double frees and leaks around setup_revisions() API used
in "git stash show", which has been fixed, and setup_revisions()
API gained a wrapper to make it more ergonomic when using it with
strvec-manged argc/argv pairs.
* jk/setup-revisions-freefix:
revision: retain argv NULL invariant in setup_revisions()
treewide: pass strvecs around for setup_revisions_from_strvec()
treewide: use setup_revisions_from_strvec() when we have a strvec
revision: add wrapper to setup_revisions() from a strvec
revision: manage memory ownership of argv in setup_revisions()
stash: tell setup_revisions() to free our allocated strings
In an argc/argv pair, the entry for argv[argc] is generally NULL. You
can iterate by counting up to argc, or by looking for the NULL entry in
argv.
When we pass such a pair to setup_revisions(), it shrinks argc to
account for the options we consumed and returns the result to the
caller. But it doesn't touch the entries after the reduced argc. So
argv[argc] will be left pointing at some arbitrary entry rather than
NULL.
This isn't the source of any known bugs, since all callers are aware of
the limitation and act accordingly. But it's a possible gotcha that may
be easy to miss.
Let's set the new argv[argc] to NULL, taking care to free it if the
caller asked us to do so.
It is tempting to do likewise for all of the entries afterwards, too, as
some of them may also need to be freed (e.g., if coming from a strvec).
But doing so isn't entirely trivial, as we munge argc in the function
(e.g., when we find "--" and move all of the entries after it into the
prune_data list). It would be possible with some light refactoring, but
it's probably not worth it. Nobody should ever look at them (they are
beyond the revised argc and past the NULL argv entry) outside of strvec
cleanup, and setup_revisions_from_strvec() already handles this case.
There's one other interesting gotcha: many callers which do not want to
provide arguments just pass 0/NULL for argc/argv. We need to check for
this case before assigning the final NULL.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>