Commit Graph

1127 Commits

Author SHA1 Message Date
Junio C Hamano
6e148f82dc Merge branch 'kk/streaming-walk-pqueue'
Streaming revision walks have been optimized by using a priority queue
for date-sorting commits, speeding up walks repositories with many
merges.

* kk/streaming-walk-pqueue:
  revision: use priority queue for non-limited streaming walks
  revision: introduce rev_walk_mode to clarify get_revision_1()
  pack-objects: call release_revisions() after cruft traversal
2026-06-16 09:01:02 -07:00
Junio C Hamano
c534ec3a5d Merge branch 'mf/revision-max-count-oldest'
"git rev-list" (and "git log" family of commands) learned a new "--max-count-oldest"
that picks oldest N commits in the range instead of the usual newest.

* mf/revision-max-count-oldest:
  bash-completions: add --max-count-oldest
  revision.c: implement --max-count-oldest
2026-06-16 09:01:02 -07:00
Junio C Hamano
883a47ef64 Merge branch 'ob/more-repo-config-values'
Many core configuration variables have been migrated from global
variables into 'repo_config_values' to tie them to a specific
repository instance, avoiding cross-repository state leakage.

* ob/more-repo-config-values:
  environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values`
  environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values`
  environment: move "core_sparse_checkout_cone" into `struct repo_config_values`
  environment: move "precomposed_unicode" into `struct repo_config_values`
  environment: move "pack_compression_level" into `struct repo_config_values`
  environment: move `zlib_compression_level` into `struct repo_config_values`
  environment: move "check_stat" into `struct repo_config_values`
  environment: move "trust_ctime" into `struct repo_config_values`
2026-06-15 07:42:00 -07:00
Junio C Hamano
53ff393204 Merge branch 'mm/line-log-cleanup'
The `git log -L` implementation has been refactored to use the
standard diff output pipeline, enabling pickaxe and diff-filter to
work as expected. Additionally, metadata-only diff formats like
--raw and --name-only are now supported with -L.

* mm/line-log-cleanup:
  line-log: allow non-patch diff formats with -L
  line-log: integrate -L output with the standard log-tree pipeline
  revision: move -L setup before output_format-to-diff derivation
2026-06-11 04:31:17 -07:00
Mirko Faina
bb4ce23284 revision.c: implement --max-count-oldest
"--max-count" is a commit limiting option and sets a maximum amount
of commits to be shown. If a user wants to see only the first N
commits of the history (the oldest commits) they'd have to do
something like

    git log $(git rev-list HEAD | tail -n N | head -n 1)

This is not very user-friendly.

Teach get_revision() the --max-count-oldest option.

Signed-off-by: Mirko Faina <mroik@delayed.space>
[jc: fixed up t4202 <xmqq7boy4o05.fsf@gitster.g>]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-03 08:37:52 +09:00
Olamide Caleb Bello
8407abf02a environment: move "warn_on_object_refname_ambiguity" into struct repo_config_values
The `core.warnAmbiguousRefs` configuration was previously stored in a
global `int` variable, making it shared across repository instances
and risking cross‑repository state leakage.

Store it instead in `repo_config_values`, where eagerly‑parsed
repository configuration lives. This option is parsed eagerly because
ambiguity warnings influence how users interpret object references in
many commands; a lazy parse could cause these warnings to behave
inconsistently or to appear for the wrong repository, confusing users
and hindering libification. This preserves the existing behavior while
tying the value to the repository from which it was read, avoiding
cross‑repository state leakage and continuing the effort to reduce
reliance on global configuration state.

Update all references to use `repo_config_values()`.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-03 08:36:48 +09:00
Junio C Hamano
7af2503365 Merge branch 'ps/graph-lane-limit'
The graph output from commands like "git log --graph" can now be
limited to a specified number of lanes, preventing overly wide output
in repositories with many branches.

* ps/graph-lane-limit:
  graph: add truncation mark to capped lanes
  graph: add --graph-lane-limit option
  graph: limit the graph width to a hard-coded max
2026-05-31 10:00:38 +09:00
Michael Montalbo
4b5d8a0163 line-log: allow non-patch diff formats with -L
Now that -L flows through log_tree_diff_flush() and diff_flush(),
metadata-only diff formats work because they only read filepair
fields (status, mode, path, oid) already set on the pre-computed
pairs.

Expand the allowlist in setup_revisions() to also accept --raw,
--name-only, --name-status, and --summary.  Diff stat formats
(--stat, --numstat, --shortstat, --dirstat) remain blocked because
they call compute_diffstat() on full blob content and would show
whole-file statistics rather than range-scoped ones.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-29 14:06:21 +09:00
Michael Montalbo
42d960748e line-log: integrate -L output with the standard log-tree pipeline
`git log -L` has bypassed log_tree_diff() and log_tree_diff_flush()
since the feature was introduced, short-circuiting from
log_tree_commit() directly into line_log_print().  This skips the
no_free save/restore (noted in a NEEDSWORK comment added by
f8781bfda3), the always_show_header fallback, show_diff_of_diff(),
and diff_free() cleanup.

Restructure so that -L flows through log_tree_diff() ->
log_tree_diff_flush(), the same path used by the normal
single-parent and merge diff codepaths:

 - Rename line_log_print() to line_log_queue_pairs() and strip it
   down to just queuing pre-computed filepairs.  The show_log(),
   separator, diffcore_std(), and diff_flush() calls are removed
   since log_tree_diff_flush() handles all of those.

 - In log_tree_diff(), call line_log_queue_pairs() then
   log_tree_diff_flush(), mirroring the diff_tree_oid() + flush
   pattern used by the single-parent and merge codepaths.

 - Remove the early return in log_tree_commit() that is no longer
   needed now that -L output flows through log_tree_diff() and
   log_tree_diff_flush(); this restores no_free save/restore,
   always_show_header, and diff_free() cleanup.

Because show_log() is now deferred until after diffcore_std() inside
log_tree_diff_flush(), pickaxe (-S, -G, --find-object) and
--diff-filter now properly suppress commits when all pairs are
filtered out.

The blank-line separator between commit header and diff changes
slightly: the old code printed one unconditionally, while
log_tree_diff_flush() only emits one for verbose headers.  This
matches the rest of log output.

Also reject --full-diff, which is not yet supported with -L: the
filepairs are pre-computed during the history walk and scoped to
tracked line ranges, so there is currently no full-tree diff to
fall back to for display.

Update tests accordingly.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-29 14:06:21 +09:00
Michael Montalbo
558057cf4f revision: move -L setup before output_format-to-diff derivation
The line_level_traverse block sets a default DIFF_FORMAT_PATCH when
no output format has been explicitly requested.  This default must
be visible to the "Did the user ask for any diff output?" check
that derives revs->diff from revs->diffopt.output_format.

Currently the -L block runs after that derivation, so revs->diff
stays 0 when no explicit format is given.  This does not matter yet
because log_tree_commit() short-circuits into line_log_print()
before consulting revs->diff, but the next commit will route -L
through the normal log_tree_diff() path, which checks revs->diff.

Move the block above the derivation so the default DIFF_FORMAT_PATCH
is in place when revs->diff is computed.  No behavior change on its
own.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-29 14:06:21 +09:00
Kristofer Karlsson
dd4bc01c0a revision: use priority queue for non-limited streaming walks
The streaming (non-limited) walk in get_revision_1() inserts newly
discovered parent commits into a date-sorted queue via
commit_list_insert_by_date(), which scans the linked list to find the
insertion point -- O(w) per insert, where w is the width of the active
walk frontier.  Replace this with an O(log w) priority queue.

Add a commit_queue field to rev_info alongside the existing commits
linked list.  The two representations are mutually exclusive: setup
and external callers that need list access use the linked list, then
get_revision_1() lazily drains it into the priority queue on first
call.  Add a REV_WALK_NO_WALK enum value to distinguish the no_walk
case (which still uses the commit list) from the streaming case.

The conversion function rev_info_commit_list_to_queue() is public so
callers that know they will iterate can convert early.

Combined with the limit_list() priority queue change already in
master, this eliminates all O(w) sorted linked-list insertion from
the revision walk machinery.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-28 06:08:20 +09:00
Kristofer Karlsson
d877b1af50 revision: introduce rev_walk_mode to clarify get_revision_1()
get_revision_1() dispatches to different walk strategies based on a
combination of rev_info flags: reflog_info, topo_walk_info, and
limited.  These conditions are checked in multiple places within
the function -- once to select the next commit, and again to decide
how to expand parents -- and the two chains must stay in sync.

Extract the mode selection into a rev_walk_mode enum and a small
get_walk_mode() helper, resolved once at the top of get_revision_1().
Both dispatch sites now switch on the same mode variable, making it
obvious that they agree and easier to verify that all modes are
handled.

No functional change.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-28 06:08:20 +09:00
Junio C Hamano
455ff75d35 Merge branch 'ps/setup-wo-the-repository'
Many uses of the_repository has been updated to use a more
appropriate struct repository instance in setup.c codepath.

* ps/setup-wo-the-repository:
  setup: stop using `the_repository` in `init_db()`
  setup: stop using `the_repository` in `create_reference_database()`
  setup: stop using `the_repository` in `initialize_repository_version()`
  setup: stop using `the_repository` in `check_repository_format()`
  setup: stop using `the_repository` in `upgrade_repository_format()`
  setup: stop using `the_repository` in `setup_git_directory()`
  setup: stop using `the_repository` in `setup_git_directory_gently()`
  setup: stop using `the_repository` in `setup_git_env()`
  setup: stop using `the_repository` in `set_git_work_tree()`
  setup: stop using `the_repository` in `setup_work_tree()`
  setup: stop using `the_repository` in `enter_repo()`
  setup: stop using `the_repository` in `verify_non_filename()`
  setup: stop using `the_repository` in `verify_filename()`
  setup: stop using `the_repository` in `path_inside_repo()`
  setup: stop using `the_repository` in `prefix_path()`
  setup: stop using `the_repository` in `is_inside_work_tree()`
  setup: stop using `the_repository` in `is_inside_git_dir()`
  setup: replace use of `the_repository` in static functions
2026-05-27 14:15:46 +09:00
Patrick Steinhardt
920dba4581 setup: stop using the_repository in verify_non_filename()
Stop using `the_repository` in `verify_non_filename()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-19 19:36:24 +09:00
Patrick Steinhardt
6e7e50cc7b setup: stop using the_repository in verify_filename()
Stop using `the_repository` in `verify_filename()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-19 19:36:24 +09:00
Kristofer Karlsson
ef8d51a8a3 revision: use priority queue in limit_list()
limit_list() maintains a date-sorted work queue of commits using a
linked list with commit_list_insert_by_date() for insertion.  Each
insertion walks the list to find the right position — O(n) per insert.
In repositories with merge-heavy histories, the symmetric difference
can contain thousands of commits, making this O(n) insertion the
dominant cost.

Replace the sorted linked list with a prio_queue (binary heap).  This
gives O(log n) insertion and O(log n) extraction instead of O(n)
insertion and O(1) extraction, which is a net win when the queue is
large.

The still_interesting() and everybody_uninteresting() helpers are
updated to scan the prio_queue's contiguous array instead of walking a
linked list.  process_parents() already accepts both a commit_list and
a prio_queue parameter, so the change in limit_list() simply switches
which one is passed.

Benchmark: git rev-list --left-right --count HEAD~N...HEAD
Repository: 2.3M commits, merge-heavy DAG (monorepo)
Best of 5 runs, times in seconds:

  commits in
  symmetric diff   baseline   patched    speedup
  --------------   --------   -------    -------
            10       0.01      0.01       1.0x
            50       0.01      0.01       1.0x
          3751      21.23      8.49       2.5x
          4524      21.70      8.29       2.6x
         10130      20.10      6.65       3.0x

No change for small traversals; 2.5-3.0x faster when the queue grows
to thousands of commits.

Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-15 04:26:25 +09:00
Junio C Hamano
1678b7de97 Merge branch 'mm/line-log-use-standard-diff-output'
The way the "git log -L<range>:<file>" feature is bolted onto the
log/diff machinery is being reworked a bit to make the feature
compatible with more diff options, like -S/G.

* mm/line-log-use-standard-diff-output:
  doc: note that -L supports patch formatting and pickaxe options
  t4211: add tests for -L with standard diff options
  line-log: route -L output through the standard diff pipeline
  line-log: fix crash when combined with pickaxe options
2026-04-07 14:59:27 -07:00
Pablo Sabater
f756a3c78d graph: add --graph-lane-limit option
Replace the hard-coded lane limit with a user-facing
option '--graph-lane-limit=<n>'. It caps the number of
visible lanes to n. This option requires '--graph', without
it, limiting the graph has no meaning, in this case error out.

Zero and negative values are valid inputs but silently
ignored treating them as "no limit", the same as not using
the option. This follows what '--max-parents' does with
negative values.

The default is 0, same as not being used.

Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-29 21:22:09 -07:00
Jeff King
22b985ef19 revision: avoid writing to const string for parent marks
We take in a "const char *", but may write a NUL into it when parsing
parent marks like "foo^-", since we want to isolate "foo" as a string
for further parsing. This is usually OK, as our "const" strings are
often actually argv strings which are technically writeable, but we'd
segfault with a string literal like:

  handle_revision_arg("HEAD^-", &revs, 0, 0);

Similar to how we handled dotdot in a previous commit, we can avoid this
by making a temporary copy of the left-hand side of the string. The cost
should negligible compared to the rest of the parsing (like actually
parsing commits to create their parent linked-lists).

There is one slightly tricky thing, though. We parse some of the marks
progressively, so that if we see "foo^!" for example, we'll strip that
down to "foo" not just for calling add_parents_only(), but also for the
rest of the function. That makes sense since we eventually want to pass
"foo" to get_oid_with_context(). But it also means that we'll keep
looking for other marks. In particular, "foo^-^!" is valid, though oddly
"foo^!^-" would ignore the "^-". I'm not sure if this is a weird
historical artifact of the implementation, or if there are important
corner cases.

So I've left the behavior unchanged. Each mark we find allocates a
string with the mark stripped, which means we could allocate multiple
times (and carry a free-able pointer for each to the end). But in
practice we won't, because of the three marks, "^@" jumps immediately to
the end without further parsing, and "^-^!" is nonsense that nobody
would pass. So you'd get one allocation in general, and never more than
two.

Another obvious option would be to just copy "arg" up front and be OK
with munging it. But that means we pay the cost even when we find no
marks. We could make a single copy upon finding a mark and then munge,
but that adds extra code to each site (checking whether somebody else
allocated, and if not, adjusting our "mark" pointer to be relative to
the copied string).

I aimed for something that was clear and obvious, if a bit verbose.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-26 12:47:17 -07:00
Jeff King
4d5fb9377b revision: make handle_dotdot() interface less confusing
There are two very subtle bits to the way we parse ".." (and "...")
range operators:

 1. In handle_dotdot_1(), we assume that the incoming arguments "dotdot"
    and "arg" are part of the same string, with the first digit of the
    range-operator blanked to a NUL. Then when we want the full name
    (e.g., to report an error), we replace the NUL with a dot to restore
    the original string.

 2. In handle_dotdot(), we take in a const string, but then we modify it
    by overwriting the range operator with a NUL. This has worked OK in
    practice since we tend to pass in buffers that are actually
    writeable (including argv), but segfaults with something like:

      handle_revision_arg("..HEAD", &revs, 0, 0);

    On top of that, building with recent versions of glibc causes the
    compiler to complain, because it notices when we use strchr() or
    strstr() to launder away constness (basically detecting the
    possibility of the segfault above via the type system).

Instead of munging the buffer, let's instead make a temporary copy of
the left-hand side of the range operator. That avoids any const
violations, and lets us pass around the parsed elements independently:
the left-hand side, the right-hand side, the number of dots (via the
"symmetric" flag), and the original full string for error messages.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-26 12:47:17 -07:00
Michael Montalbo
86e986f166 line-log: route -L output through the standard diff pipeline
`git log -L` has always bypassed the standard diff pipeline.
`dump_diff_hacky()` in line-log.c hand-rolls its own diff headers and
hunk output, which means most diff formatting options are silently
ignored.  A NEEDSWORK comment has acknowledged this since the feature
was introduced:

    /*
     * NEEDSWORK: manually building a diff here is not the Right
     * Thing(tm).  log -L should be built into the diff pipeline.
     */

Remove `dump_diff_hacky()` and its helpers and route -L output through
`builtin_diff()` / `fn_out_consume()`, the same path used by `git diff`
and `git log -p`.  The mechanism is a pair of callback wrappers that sit
between `xdi_diff_outf()` and `fn_out_consume()`, filtering xdiff's
output to only the tracked line ranges.  To ensure xdiff emits all lines
within each range as context, the context length is inflated to span the
largest range.

Wire up the `-L` implies `--patch` default in revision setup rather
than forcing it at output time, so `line_log_print()` is just
`diffcore_std()` + `diff_flush()` with no format save/restore.
Rename detection is a no-op since pairs are already resolved during
the history walk in `queue_diffs()`, but running `diffcore_std()`
means `-S`/`-G` (pickaxe), `--orderfile`, and `--diff-filter` now
work with `-L`, and `diff_resolve_rename_copy()` sets pair statuses
correctly without manual assignment.

Switch `diff_filepair_dup()` from `xmalloc` to `xcalloc` so that new
fields (including `line_ranges`) are zero-initialized by default.

As a result, diff formatting options that were previously silently
ignored (e.g. --word-diff, --no-prefix, -w, --color-moved) now work
with -L, and output gains `index` lines, `new file mode` headers, and
funcname context in `@@` headers.  This is a user-visible output change:
tools that parse -L output may need to handle the additional header
lines.

The context-length inflation means xdiff may process more output than
needed for very wide line ranges, but benchmarks on files up to 7800
lines show no measurable regression.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16 21:05:42 -07:00
Junio C Hamano
d445aecfb0 Merge branch 'ps/refs-for-each'
Code refactoring around refs-for-each-* API functions.

* ps/refs-for-each:
  refs: replace `refs_for_each_fullref_in()`
  refs: replace `refs_for_each_namespaced_ref()`
  refs: replace `refs_for_each_glob_ref()`
  refs: replace `refs_for_each_glob_ref_in()`
  refs: replace `refs_for_each_rawref_in()`
  refs: replace `refs_for_each_rawref()`
  refs: replace `refs_for_each_ref_in()`
  refs: improve verification for-each-ref options
  refs: generalize `refs_for_each_fullref_in_prefixes()`
  refs: generalize `refs_for_each_namespaced_ref()`
  refs: speed up `refs_for_each_glob_ref_in()`
  refs: introduce `refs_for_each_ref_ext`
  refs: rename `each_ref_fn`
  refs: rename `do_for_each_ref_flags`
  refs: move `do_for_each_ref_flags` further up
  refs: move `refs_head_ref_namespaced()`
  refs: remove unused `refs_for_each_include_root_ref()`
2026-03-09 14:36:55 -07:00
Junio C Hamano
7b7d67104e Merge branch 'pw/no-more-NULL-means-current-worktree'
API clean-up for the worktree subsystem.

* pw/no-more-NULL-means-current-worktree:
  path: remove repository argument from worktree_git_path()
  wt-status: avoid passing NULL worktree
2026-03-04 10:53:00 -08:00
Junio C Hamano
9eb5b3b999 Merge branch 'ps/odb-for-each-object'
Revamp object enumeration API around odb.

* ps/odb-for-each-object:
  odb: drop unused `for_each_{loose,packed}_object()` functions
  reachable: convert to use `odb_for_each_object()`
  builtin/pack-objects: use `packfile_store_for_each_object()`
  odb: introduce mtime fields for object info requests
  treewide: drop uses of `for_each_{loose,packed}_object()`
  treewide: enumerate promisor objects via `odb_for_each_object()`
  builtin/fsck: refactor to use `odb_for_each_object()`
  odb: introduce `odb_for_each_object()`
  packfile: introduce function to iterate through objects
  packfile: extract function to iterate through objects of a store
  object-file: introduce function to iterate through objects
  object-file: extract function to read object info from path
  odb: fix flags parameter to be unsigned
  odb: rename `FOR_EACH_OBJECT_*` flags
2026-03-02 17:06:50 -08:00
Junio C Hamano
8d15dd1ce1 Merge branch 'ds/revision-maximal-only'
"git rev-list" and friends learn "--maximal-only" to show only the
commits that are not reachable by other commits.

* ds/revision-maximal-only:
  revision: add --maximal-only option
2026-02-25 11:54:17 -08:00
Patrick Steinhardt
1dd4f1e43f refs: replace refs_for_each_fullref_in()
Replace calls to `refs_for_each_fullref_in()` with the newly introduced
`refs_for_each_ref_ext()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:21:19 -08:00
Patrick Steinhardt
3fc1ad03c6 refs: replace refs_for_each_glob_ref()
Replace calls to `refs_for_each_glob_ref()` with the newly introduced
`refs_for_each_ref_ext()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:21:19 -08:00
Patrick Steinhardt
4091d29893 refs: replace refs_for_each_glob_ref_in()
Replace calls to `refs_for_each_glob_ref_in()` with the newly introduced
`refs_for_each_ref_ext()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:21:19 -08:00
Patrick Steinhardt
635f08b739 refs: rename each_ref_fn
Similar to the preceding commit, rename `each_ref_fn` to better match
our current best practices around how we name things.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:21:18 -08:00
Phillip Wood
a49cb0f093 path: remove repository argument from worktree_git_path()
worktree_git_path() takes a struct repository and a struct worktree
which also contains a struct repository. The repository argument
was added by a973f60dc7 (path: stop relying on `the_repository` in
`worktree_git_path()`, 2024-08-13) and exists because the worktree
argument is optional. Having two ways of passing a repository is
a potential foot-gun as if the the worktree argument is present the
repository argument must match the worktree's repository member. Since
the last commit there are no callers that pass a NULL worktree so lets
remove the repository argument. This removes the potential confusion
and lets us delete a number of uses of "the_repository".

worktree_git_path() has the following callers:

 - builtin/worktree.c:validate_no_submodules() which is called from
   check_clean_worktree() and move_worktree(), both of which supply
   a non-NULL worktree.

 - builtin/fsck.c:cmd_fsck() which loops over all worktrees.

 - revision.c:add_index_objects_to_pending() which loops over all
   worktrees.

 - worktree.c:worktree_lock_reason() which dereferences wt before
   calling worktree_git_path().

 - wt-status.c:wt_status_check_bisect() and wt_status_check_rebase()
   which are always called with a non-NULL worktree after the last
   commit.

 - wt-status.c:git_branch() which is only called by
   wt_status_check_bisect() and wt_status_check_rebase().

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-19 11:03:24 -08:00
Junio C Hamano
5288202433 Merge branch 'ps/commit-list-functions-renamed'
Rename three functions around the commit_list data structure.

* ps/commit-list-functions-renamed:
  commit: rename `free_commit_list()` to conform to coding guidelines
  commit: rename `reverse_commit_list()` to conform to coding guidelines
  commit: rename `copy_commit_list()` to conform to coding guidelines
2026-02-13 13:39:25 -08:00
Patrick Steinhardt
2813c97310 treewide: enumerate promisor objects via odb_for_each_object()
We have multiple callsites where we enumerate all promisor objects in
the object database via `for_each_packed_object()`. This is done by
passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to
skip over all non-promisor objects.

These callsites can be trivially converted to `odb_for_each_object()` as
we know to skip enumeration of loose objects in case the `PROMISOR_ONLY`
flag was passed by the caller.

Refactor the sites accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:07 -08:00
Patrick Steinhardt
bd1855b897 odb: rename FOR_EACH_OBJECT_* flags
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:06 -08:00
Derrick Stolee
b4e8f60a3c revision: add --maximal-only option
When inspecting a range of commits from some set of starting references, it
is sometimes useful to learn which commits are not reachable from any other
commits in the selected range.

One such application is in the creation of a sequence of bundles for the
bundle URI feature. Creating a stack of bundles representing different
slices of time includes defining which references to include. If all
references are used, then this may be overwhelming or redundant. Instead,
selecting commits that are maximal to the range could help defining a
smaller reference set to use in the bundle header.

Add a new '--maximal-only' option to restrict the output of a revision range
to be only the commits that are not reachable from any other commit in the
range, based on the reachability definition of the walk.

This is accomplished by adding a new 28th bit flag, CHILD_VISITED, that is
set as we walk. This does extend the bit range in object.h, but using an
earlier bit may collide with another feature.

The tests demonstrate the behavior of the feature with a positive-only
range, ranges with negative references, and walk-modifying flags like
--first-parent and --exclude-first-parent-only.

Since the --boundary option would not increase any results when used with
the --maximal-only option, mark them as incompatible.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-22 10:58:14 -08:00
Junio C Hamano
214cbb7b1d Merge branch 'rs/tree-wo-the-repository'
Remove implicit reliance on the_repository global in the APIs
around tree objects and make it explicit which repository to work
in.

* rs/tree-wo-the-repository:
  cocci: remove obsolete the_repository rules
  cocci: convert parse_tree functions to repo_ variants
  tree: stop using the_repository
  tree: use repo_parse_tree()
  path-walk: use repo_parse_tree_gently()
  pack-bitmap-write: use repo_parse_tree()
  delta-islands: use repo_parse_tree()
  bloom: use repo_parse_tree()
  add-interactive: use repo_parse_tree_indirect()
  tree: add repo_parse_tree*()
  environment: move access to core.maxTreeDepth into repo settings
2026-01-21 16:16:28 -08:00
Junio C Hamano
d627023d80 Merge branch 'ps/packfile-store-in-odb-source'
The packfile_store data structure is moved from object store to odb
source.

* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
2026-01-21 08:28:59 -08:00
Junio C Hamano
ec16dde5c8 Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-object
* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
  odb: properly close sources before freeing them
  builtin/gc: fix condition for whether to write commit graphs
2026-01-15 05:50:16 -08:00
Patrick Steinhardt
9f18d089c5 commit: rename free_commit_list() to conform to coding guidelines
Our coding guidelines say that:

  Functions that operate on `struct S` are named `S_<verb>()` and should
  generally receive a pointer to `struct S` as first parameter.

While most of the functions related to `struct commit_list` already
follow that naming schema, `free_commit_list()` doesn't.

Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-15 05:32:31 -08:00
Patrick Steinhardt
ff9fb2cfe6 commit: rename copy_commit_list() to conform to coding guidelines
Our coding guidelines say that:

  Functions that operate on `struct S` are named `S_<verb>()` and should
  generally receive a pointer to `struct S` as first parameter.

While most of the functions related to `struct commit_list` already
follow that naming schema, `copy_commit_list()` doesn't.

Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-15 05:32:31 -08:00
René Scharfe
ec7a16b145 cocci: convert parse_tree functions to repo_ variants
Add and apply a semantic patch to convert calls to parse_tree() and
friends to the corresponding variant that takes a repository argument,
to allow the functions that implicitly use the_repository to be retired
once all potential in-flight topics are settled and converted as well.

The changes in .c files were generated by Coccinelle, but I fixed a
whitespace bug it would have introduced to builtin/commit.c.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 18:36:18 -08:00
Patrick Steinhardt
085de91b95 packfile: refactor kept-pack cache to work with packfile stores
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.

Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.

Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:06 -08:00
René Scharfe
d8a17ef09b revision: export commit_stack
Dynamic arrays of commit pointers are used in several places.  Some of
them use a custom struct to hold array, item count and capacity, others
have them as separate variables linked by a common name part.

Pick one succinct, clean implementation -- commit_stack -- and convert
the different variants to it to reduce code duplication.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-25 08:29:27 +09:00
Patrick Steinhardt
bdbebe5714 refs: introduce wrapper struct for each_ref_fn
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.

It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.

One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.

Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.

[1]: <ZmarVcF5JjsZx0dl@tanuki>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:24 -08:00
Junio C Hamano
47c3e03034 Merge branch 'ps/commit-graph-per-object-source'
Code clean-up around commit-graph.

* ps/commit-graph-per-object-source:
  commit-graph: pass graphs that are to be merged as parameter
  commit-graph: return commit graph from `repo_find_commit_pos_in_graph()`
  commit-graph: return the prepared commit graph from `prepare_commit_graph()`
  revision: drop explicit check for commit graph
  blame: drop explicit check for commit graph
2025-10-13 22:00:35 -07:00
Junio C Hamano
4bac57bc67 Merge branch 'jk/setup-revisions-freefix'
There are double frees and leaks around setup_revisions() API used
in "git stash show", which has been fixed, and setup_revisions()
API gained a wrapper to make it more ergonomic when using it with
strvec-manged argc/argv pairs.

* jk/setup-revisions-freefix:
  revision: retain argv NULL invariant in setup_revisions()
  treewide: pass strvecs around for setup_revisions_from_strvec()
  treewide: use setup_revisions_from_strvec() when we have a strvec
  revision: add wrapper to setup_revisions() from a strvec
  revision: manage memory ownership of argv in setup_revisions()
  stash: tell setup_revisions() to free our allocated strings
2025-09-29 11:40:34 -07:00
Jeff King
a04bc71725 revision: retain argv NULL invariant in setup_revisions()
In an argc/argv pair, the entry for argv[argc] is generally NULL. You
can iterate by counting up to argc, or by looking for the NULL entry in
argv.

When we pass such a pair to setup_revisions(), it shrinks argc to
account for the options we consumed and returns the result to the
caller. But it doesn't touch the entries after the reduced argc. So
argv[argc] will be left pointing at some arbitrary entry rather than
NULL.

This isn't the source of any known bugs, since all callers are aware of
the limitation and act accordingly. But it's a possible gotcha that may
be easy to miss.

Let's set the new argv[argc] to NULL, taking care to free it if the
caller asked us to do so.

It is tempting to do likewise for all of the entries afterwards, too, as
some of them may also need to be freed (e.g., if coming from a strvec).
But doing so isn't entirely trivial, as we munge argc in the function
(e.g., when we find "--" and move all of the entries after it into the
prune_data list). It would be possible with some light refactoring, but
it's probably not worth it. Nobody should ever look at them (they are
beyond the revised argc and past the NULL argv entry) outside of strvec
cleanup, and setup_revisions_from_strvec() already handles this case.

There's one other interesting gotcha: many callers which do not want to
provide arguments just pass 0/NULL for argc/argv. We need to check for
this case before assigning the final NULL.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22 14:27:03 -07:00
Jeff King
f93c1d86cc revision: add wrapper to setup_revisions() from a strvec
The setup_revisions() function was designed to take the argc/argv pair
from the operating system. But we sometimes construct our own argv using
a strvec and pass that in. There are a few gotchas that callers need to
deal with here:

  1. You should always pass the free_removed_argv_elements option via
     setup_revision_opt. Otherwise, entries may be leaked if
     setup_revisions() re-shuffles options.

  2. After setup_revisions() returns, the strvec state is odd. We get a
     reduced argc from setup_revisions() telling us how many unknown
     options were left in place. Entries after that in argv may be
     retained, or may be NULL (depending on how the reshuffling
     happened). But the strvec's "nr" field still represents the
     original value, and some of the entries it thinks it is still
     storing may be NULL. Callers must be careful with how they access
     it.

Some callers deal with (1), but not all. In practice they are OK because
they do not pass any options that would cause setup_revisions() to
re-shuffle (namely unknown options which may be relayed from the user,
and the use of the "--" separator). But it's probably a good idea to
consistently pass this option anyway to future-proof ourselves against
the details of setup_revisions() changing.

No callers address (2), though I don't think there any visible bugs.
Most of them simply call strvec_clear() and never otherwise look at the
result. And in fact, if they naively set foo.nr to the argc returned by
setup_revisions(), that would cause leaks!  Because setup_revisions()
does not free consumed options[1], we have to leave the "nr" field of
the strvec at its original value to find and free them during
strvec_clear().

So I don't think there are any bugs to fix here, but we can make things
safer and simpler for callers. Let's introduce a helper function that
sets the free_removed_argv_elements automatically and shrinks the strvec
to represent the retained options afterwards (taking care to free the
now-obsolete entries).

We'll start by converting all of the call-sites which use the
free_removed_argv_elements option. There should be no behavior change
for them, except that their "shrunken" entries are cleaned up
immediately, rather than waiting for a strvec_clear() call.

[1] Arguably setup_revisions() should be doing this step for us if we
    told it to free removed options, but there are many existing callers
    which will be broken if it did. Introducing this helper is a
    possible first step towards that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22 14:27:03 -07:00
Jeff King
cd43948798 revision: manage memory ownership of argv in setup_revisions()
The setup_revisions() function takes an argc/argv pair and consumes
arguments from it, returning a reduced argc count to the caller. But it
may also overwrite entries within the argv array, as it shifts unknown
options to the front of argv (so they can be found in the range of
0..argc-1 after we return).

For a normal argc/argv coming from the operating system, this is OK.
We don't need to worry about memory ownership of the strings in those
entries. But some callers pass in allocated strings from a strvec, and
we do need to care about those.

We faced a similar issue in f92dbdbc6a (revisions API: don't leak memory
on argv elements that need free()-ing, 2022-08-02), which added an
option for callers to tell us that elements need to be freed. But the
implementation within setup_revisions() was incomplete.  It only covered
the case of dropping "--", but not the movement of unknown options.

When we shift argv entries around, we should free the elements we are
about to overwrite, so they are not leaked. For example, in:

  git stash show -p --invalid

we will pass this to setup_revisions():

  argc = 3, argv[] = { "show", "-p", "--invalid", NULL }

which will then return:

   argc = 2, argv[] = { "show", "--invalid", "--invalid", NULL }

overwriting the "-p" entry, which is leaked unless we free it at that
moment.

You can see in the output above another potential problem. We now have
two copies of the "--invalid" string. If the caller does not respect the
new argc when free-ing the strings via strvec_clear(), we'll get a
double-free. And git-stash suffers from this, and will crash with the
above command.

So it seems at first glance that the solution is to just assign the
reduced argc to the strvec.nr field in the caller. Then it would stop
after freeing only any copied entries. But that's not always right
either!

Remember that we are reducing "argc" to account for elements we've
consumed. So if there isn't an invalid option, we'd turn:

  argc = 2, argv[] = { "show", "-p", NULL }

into:

  argc = 1, argv[] = { "show", "-p", NULL }

In that case strvec_clear() must keep looking past the shortened argc we
return to find the original "-p" to free. It needs to use the original
argc to do that.

We can solve this by turning our argv writes into strict moves, not
copies. When we shuffle an unknown option to the front, we'll overwrite
its old position with NULL. That leaves an argv array that may have NULL
"holes" in it.

So in the "--invalid" example above we get:

   argc = 2, argv[] = { "show", "--invalid", NULL, NULL }

but something like "git stash -p --invalid -p" would yield:

  argc = 3, argv[] = { "show", "--invalid", NULL, "-p", NULL }

because we move "--invalid" to overwrite the first "-p", but the second
one is quietly consumed. But strvec_clear() can handle that fine (it
iterates over the "nr" field, and passing NULL to free() is OK).

To ease the implementation, I've introduced a helper function. It's a
little hacky because it must take a double-pointer to set the old
position to NULL. Which in turn means we cannot pass "&arg", our local
alias for the current entry we're parsing, but instead "&argv[i]", the
pointer in the original array. And to make it even more confusing, we
delegate some of this work to handle_revision_opt(), which is passed a
subset of the argv array, so is always working on "&argv[0]".

Likewise, because handle_revision_opt() only receives the part of argv
left to parse, it receives the array to accumulate unknown options as a
separate unkc/unkv pair. But we're always working on the same argv
array, so our strategy works fine. I suspect this would be a bit more
obvious (and avoid some pointer cleverness) if all functions saw the
full argv array and worked with positions within it (and our new helper
would take two positions, a src and dst). But that would involve
refactoring handle_revision_opt().  I punted on that, as what's here is
not too ugly and is all contained within revision.c itself.

The new test demonstrates that "git stash show -p --invalid" no longer
crashes with a double-free (because we move instead of copy). And it
passes with SANITIZE=leak because we free "-p" before overwriting.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22 14:27:03 -07:00
Patrick Steinhardt
307e30792b revision: drop explicit check for commit graph
When filtering down revisions by paths we know to use bloom filters from
the commit graph, if we have any. The entry point for this is in
`check_maybe_different_in_bloom_filter()`, where we first verify that:

  - We do have a commit graph.

  - That the commit is contained therein by checking that we have a
    proper generation number.

  - And that the graph contains a bloom filter.

The first check is somewhat redundant though: if we don't have a commit
graph, then the second check would already tell us that we don't have a
generation number for the specific commit.

In theory this could be seen as a performance optimization to
short-circuit for scenarios where there is no commit graph. But in
practice this shouldn't matter: if there is no commit graph, then the
commit graph data slab would also be unpopulated and thus a lookup of
the commit should happen in constant time.

Drop the unnecessary check.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-04 16:16:21 -07:00
Junio C Hamano
b4e38c1acd Merge branch 'ly/changed-path-traversal-with-magic-pathspec'
Revision traversal limited with pathspec, like "git log dir/*",
used to ignore changed-paths Bloom filter when the pathspec
contained wildcards; now they take advantage of the filter when
they can.

* ly/changed-path-traversal-with-magic-pathspec:
  bloom: enable bloom filter with wildcard pathspec in revision traversal
2025-08-21 13:47:02 -07:00