Commit Graph

13743 Commits

Author SHA1 Message Date
Junio C Hamano
06f63df846 Merge branch 'ps/odb-source-loose'
The loose object source has been refactored into a proper `struct
odb_source`.

* ps/odb-source-loose:
  odb/source-loose: drop pointer to the "files" source
  odb/source-loose: stub out remaining callbacks
  odb/source-loose: wire up `write_object_stream()` callback
  object-file: refactor writing objects to use loose source
  odb/source-loose: wire up `write_object()` callback
  loose: refactor object map to operate on `struct odb_source_loose`
  odb/source-loose: wire up `freshen_object()` callback
  odb/source-loose: drop `odb_source_loose_has_object()`
  odb/source-loose: wire up `count_objects()` callback
  odb/source-loose: wire up `find_abbrev_len()` callback
  odb/source-loose: wire up `for_each_object()` callback
  odb/source-loose: wire up `read_object_stream()` callback
  odb/source-loose: wire up `read_object_info()` callback
  odb/source-loose: wire up `close()` callback
  odb/source-loose: wire up `reprepare()` callback
  odb/source-loose: start converting to a proper `struct odb_source`
  odb/source-loose: store pointer to "files" instead of generic source
  odb/source-loose: move loose source into "odb/" subsystem
2026-06-11 04:31:18 -07:00
Junio C Hamano
2fd113ae07 Merge branch 'rs/strbuf-add-oid-hex'
Formatting object name in full hexadecimal form has been optimized
by using a new strbuf_add_oid_hex() helper function.

* rs/strbuf-add-oid-hex:
  hex: add and use strbuf_add_oid_hex()
2026-06-09 10:04:50 +09:00
Junio C Hamano
7eaa3c82a8 Merge branch 'rs/strbuf-add-uint'
Adding a decimal integer with strbuf_addf("%u") appears commonly;
they have been optimized by using a custom formatter.

* rs/strbuf-add-uint:
  ls-tree: use strbuf_add_uint()
  ls-files: use strbuf_add_uint()
  cat-file: use strbuf_add_uint()
  strbuf: add strbuf_add_uint()
2026-06-09 10:04:50 +09:00
Junio C Hamano
2c677d20b6 Merge branch 'ua/push-remote-group'
"git push" learned to take a "remote group" name to push to, which
causes pushes to multiple places, just like "git fetch" would do.

* ua/push-remote-group:
  push: support pushing to a remote group
  remote: move remote group resolution to remote.c
  remote: fix sign-compare warnings in push_cas_option
2026-06-09 10:04:50 +09:00
Junio C Hamano
de5383c2ce Merge branch 'aj/stash-patch-optimize-temporary-index'
"git stash -p" has been optimized by reusing cached index
entries in its temporary index, avoiding unnecessary lstat()
calls on unchanged files.

* aj/stash-patch-optimize-temporary-index:
  stash: reuse cached index entries in --patch temporary index
2026-06-07 23:58:25 +09:00
Junio C Hamano
92b870a675 Merge branch 'kh/free-commit-list'
Code clean-up.

* kh/free-commit-list:
  commit: remove deprecated functions
  *: replace deprecated free_commit_list
2026-06-07 23:58:24 +09:00
Junio C Hamano
7450009e6f Merge branch 'ds/restore-sparse-index'
'git restore --staged' has been optimized to avoid unnecessarily expanding
the sparse index when operating on paths within the sparse checkout
definition, by handling sparse directory entries at the tree level.

* ds/restore-sparse-index:
  restore: avoid sparse index expansion
  t1092: test 'git restore' with sparse index
2026-06-07 23:58:24 +09:00
Junio C Hamano
17204228cf Merge branch 'ar/receive-pack-worktree-env'
The GIT_WORK_TREE variable prepared to invoke the push-to-checkout
hook was leaking into the environment even when there was no hook
used and broke the default push-to-deploy (i.e., let "git checkout"
update the working tree only when the working tree is clean).

* ar/receive-pack-worktree-env:
  receive-pack: fix updateInstead with core.worktree
2026-06-07 23:58:24 +09:00
Junio C Hamano
ffaa2eddd0 Merge branch 'ds/path-walk-filters'
The "git pack-objects --path-walk" traversal has been integrated
with several object filters, including blobless and sparse filters.

* ds/path-walk-filters:
  path-walk: support `combine` filter
  path-walk: support `object:type` filter
  path-walk: support `tree:0` filter
  t6601: tag otherwise-unreachable trees
  pack-objects: support sparse:oid filter with path-walk
  path-walk: add pl_sparse_trees to control tree pruning
  path-walk: support blob size limit filter
  backfill: die on incompatible filter options
  path-walk: support blobless filter
  path-walk: always emit directly-requested objects
  t/perf: add pack-objects filter and path-walk benchmark
  pack-objects: pass --objects with --path-walk
  t5620: make test work with path-walk var
2026-06-02 16:15:29 +09:00
Junio C Hamano
7b3ab91768 Merge branch 'jk/connect-service-enum'
The "name" argument in git_connect() and related functions has been
converted to a "service" enum to improve type safety and clarify its
purpose.

* jk/connect-service-enum:
  transport-helper: fix typo in BUG() message
  connect: use "service" enum for "name" argument
2026-06-02 16:15:28 +09:00
Patrick Steinhardt
86f7ab5a1f odb/source-loose: drop odb_source_loose_has_object()
The function `odb_source_loose_has_object()` checks whether a specific
object exists as a loose object on disk by using lstat(3p). This
interface is somewhat redundant, as we typically check for object
existence in a generic way via `odb_source_read_object_info()`.

In fact, these two calls are redundant in case the latter is called in a
specific way: when called without an object info request and without the
`OBJECT_INFO_QUICK` flag, then we will end up doing the same call to
lstat(3p) in `read_object_info_from_path()`.

Drop the function and adapt callers to instead use the generic
interface so that its calling conventions align with that of other
sources.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-01 18:47:18 +09:00
Patrick Steinhardt
2ade08ac29 odb/source-loose: wire up count_objects() callback
Move `odb_source_loose_count_objects()` and its associated helpers from
"object-file.c" into "odb/source-loose.c" and wire it up as the
`count_objects()` callback of the loose source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-01 18:47:18 +09:00
Patrick Steinhardt
e4f1d9ba57 odb/source-loose: wire up for_each_object() callback
Move `odb_source_loose_for_each_object()` and its associated helpers
from "object-file.c" into "odb/source-loose.c" and wire it up as the
`for_each_object()` callback of the loose source.

Again, as in the preceding commit, we are forced to expose a couple of
functions from "object-file.c" that are now used by both subsystems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-01 18:47:18 +09:00
Junio C Hamano
33da2f4d3b Merge branch 'sa/cat-file-batch-mailmap-switch'
"git cat-file --batch" learns an in-line command "mailmap"
that lets the user toggle use of mailmap.

* sa/cat-file-batch-mailmap-switch:
  cat-file: add mailmap subcommand to --batch-command
2026-05-31 10:00:38 +09:00
Junio C Hamano
4d11b9c218 Merge branch 'pt/fsmonitor-linux'
The fsmonitor daemon has been implemented for Linux.

* pt/fsmonitor-linux:
  fsmonitor: convert shown khash to strset in do_handle_client
  fsmonitor: add tests for Linux
  fsmonitor: add timeout to daemon stop command
  fsmonitor: close inherited file descriptors and detach in daemon
  run-command: add close_fd_above_stderr option
  fsmonitor: implement filesystem change listener for Linux
  fsmonitor: rename fsm-settings-darwin.c to fsm-settings-unix.c
  fsmonitor: rename fsm-ipc-darwin.c to fsm-ipc-unix.c
  fsmonitor: use pthread_cond_timedwait for cookie wait
  compat/win32: add pthread_cond_timedwait
  fsmonitor: fix hashmap memory leak in fsmonitor_run_daemon
  fsmonitor: fix khash memory leak in do_handle_client
  t9210, t9211: disable GIT_TEST_SPLIT_INDEX for scalar clone tests
2026-05-31 10:00:38 +09:00
Junio C Hamano
d2c01318b0 Merge branch 'jr/bisect-custom-terms-in-output'
"git bisect" now uses the selected terms (e.g., old/new) more
consistently in its output.

* jr/bisect-custom-terms-in-output:
  rev-parse: use selected alternate terms to look up refs
  bisect: print bisect terms in single quotes
  bisect: use selected alternate terms in status output
2026-05-31 10:00:37 +09:00
Kristoffer Haugsbakk
7dd898a92d *: replace deprecated free_commit_list
Replace `free_commit_list` with `commit_list_free`. The former was
deprecated in 9f18d089 (commit: rename `free_commit_list()` to conform
to coding guidelines, 2026-01-15).

This allows us to remove all the deprecated functions in the
next commit:

• `copy_commit_list`
• `reverse_commit_list`
• `free_commit_list`

Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-29 05:11:02 +09:00
Junio C Hamano
455ff75d35 Merge branch 'ps/setup-wo-the-repository'
Many uses of the_repository has been updated to use a more
appropriate struct repository instance in setup.c codepath.

* ps/setup-wo-the-repository:
  setup: stop using `the_repository` in `init_db()`
  setup: stop using `the_repository` in `create_reference_database()`
  setup: stop using `the_repository` in `initialize_repository_version()`
  setup: stop using `the_repository` in `check_repository_format()`
  setup: stop using `the_repository` in `upgrade_repository_format()`
  setup: stop using `the_repository` in `setup_git_directory()`
  setup: stop using `the_repository` in `setup_git_directory_gently()`
  setup: stop using `the_repository` in `setup_git_env()`
  setup: stop using `the_repository` in `set_git_work_tree()`
  setup: stop using `the_repository` in `setup_work_tree()`
  setup: stop using `the_repository` in `enter_repo()`
  setup: stop using `the_repository` in `verify_non_filename()`
  setup: stop using `the_repository` in `verify_filename()`
  setup: stop using `the_repository` in `path_inside_repo()`
  setup: stop using `the_repository` in `prefix_path()`
  setup: stop using `the_repository` in `is_inside_work_tree()`
  setup: stop using `the_repository` in `is_inside_git_dir()`
  setup: replace use of `the_repository` in static functions
2026-05-27 14:15:46 +09:00
Junio C Hamano
2f952b81ed Merge branch 'jt/odb-transaction-write'
ODB transaction interface is being reworked to explicitly handle
object writes.

* jt/odb-transaction-write:
  odb/transaction: make `write_object_stream()` pluggable
  object-file: generalize packfile writes to use odb_write_stream
  object-file: avoid fd seekback by checking object size upfront
  object-file: remove flags from transaction packfile writes
  odb: update `struct odb_write_stream` read() callback
  odb/transaction: use pluggable `begin_transaction()`
  odb: split `struct odb_transaction` into separate header
2026-05-27 14:15:45 +09:00
Junio C Hamano
8b5873a1f2 Merge branch 'tb/incremental-midx-part-3.3'
The repacking code has been refactored and compaction of MIDX layers
have been implemented, and incremental strategy that does not require
all-into-one repacking has been introduced.

* tb/incremental-midx-part-3.3:
  repack: allow `--write-midx=incremental` without `--geometric`
  repack: introduce `--write-midx=incremental`
  repack: implement incremental MIDX repacking
  packfile: ensure `close_pack_revindex()` frees in-memory revindex
  builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
  repack-geometry: prepare for incremental MIDX repacking
  repack-midx: extract `repack_fill_midx_stdin_packs()`
  repack-midx: factor out `repack_prepare_midx_command()`
  midx: expose `midx_layer_contains_pack()`
  repack: track the ODB source via existing_packs
  midx: support custom `--base` for incremental MIDX writes
  midx: introduce `--no-write-chain-file` for incremental MIDX writes
  midx: use `strvec` for `keep_hashes`
  midx: build `keep_hashes` array in order
  midx: use `strset` for retained MIDX files
  midx-write: handle noop writes when converting incremental chains
2026-05-27 14:15:45 +09:00
Junio C Hamano
1103041f34 Merge branch 'ds/fetch-negotiation-options'
The negotiation tip options in "git fetch" have been reworked to
allow requiring certain refs to be sent as "have" lines, and to
restrict negotiation to a specific set of refs.

* ds/fetch-negotiation-options:
  send-pack: pass negotiation config in push
  remote: add remote.*.negotiationInclude config
  fetch: add --negotiation-include option for negotiation
  negotiator: add have_sent() interface
  remote: add remote.*.negotiationRestrict config
  transport: rename negotiation_tips
  fetch: add --negotiation-restrict option
  t5516: fix test order flakiness
2026-05-27 14:15:45 +09:00
Junio C Hamano
9020a116d6 Merge branch 'kk/merge-octopus-optim'
The logic to determine that branches in an octopus merge are
independent has been optimized.

* kk/merge-octopus-optim:
  merge: use repo_in_merge_bases for octopus up-to-date check
2026-05-27 14:15:44 +09:00
Junio C Hamano
6d2ba7ead7 Merge branch 'en/batch-prefetch'
In a lazy clone, "git cherry" and "git grep" often fetch necessary
blob objects one by one from promisor remotes.  It has been corrected
to collect necessary object names and fetch them in bulk to gain
reasonable performance.

* en/batch-prefetch:
  grep: prefetch necessary blobs
  builtin/log: prefetch necessary blobs for `git cherry`
  patch-ids.h: add missing trailing parenthesis in documentation comment
  promisor-remote: document caller filtering contract
2026-05-27 14:15:44 +09:00
Derrick Stolee
105aacd072 restore: avoid sparse index expansion
Teach update_some() to handle sparse directory entries at the tree
level rather than expanding the entire sparse index. When iterating a
source tree during checkout/restore operations:

 - If a directory matches a sparse directory entry with the same OID,
   skip it entirely (no change needed).

 - If the OID differs and we are in non-overlay mode (e.g., restore
   --staged), update the sparse directory entry's OID in place. This
   is semantically correct because non-overlay mode removes paths not
   in the source tree anyway.

 - In overlay mode (e.g., checkout <tree> -- .), fall through to
   recursive descent so individual file entries are preserved
   correctly.

Also switch from index_name_pos() to index_name_pos_sparse() for
individual file lookups to avoid triggering ensure_full_index() when
the file is already individually tracked in the index.

Update the test expectation in t1092 to assert that 'restore --staged'
no longer expands the sparse index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-27 13:42:59 +09:00
Alyssa Ross
44d04e4426 receive-pack: fix updateInstead with core.worktree
Before a8cc594333 (hooks: fix an obscure TOCTOU "did we just run a
hook?" race, 2022-03-07), when receive.denyCurrentBranch is set to
updateInstead, only one of push_to_checkout() or push_to_deploy()
was called.  That commit changed to always call push_to_checkout(),
and then to call push_to_deploy() if push_to_checkout() didn't run
anything.

This change didn't take into account that push_to_checkout() had a
side effect of modifying env, and that modified env broke updating
the worktree in push_to_deploy() if core.worktree was configured.
To fix this, only mutate the environment used inside
push_to_commit(), rather than the environment that might later be
passed to push_to_deploy().

Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-26 07:54:18 +09:00
Junio C Hamano
0d5b240d73 Merge branch 'kk/paint-down-to-common-optim'
"git merge-base" optimization.

* kk/paint-down-to-common-optim:
  commit-reach: early exit paint_down_to_common for single merge-base
  commit-reach: introduce merge_base_flags enum
2026-05-25 09:40:07 +09:00
Adam Johnson
48513e05e2 stash: reuse cached index entries in --patch temporary index
`git stash -p` prepares the interactive selection by creating a
temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
running the `add -p` machinery.

That temporary index was created by running `git read-tree HEAD`.  The
resulting index had no useful cached stat data or fsmonitor-valid bits
from the real index.  When `run_add_p()` refreshed that temporary index
before showing the first prompt, it could end up lstat(2)-ing every
tracked file, even in a repository where `git diff` and `git restore -p`
can use fsmonitor to avoid that work.

Create the temporary index in-process instead.  Use `unpack_trees()` to
reset the real index contents to HEAD while writing the result to the
temporary index path.  For paths whose index entries already match HEAD,
`oneway_merge()` reuses the existing cache entries, preserving their
cached stat data and `CE_FSMONITOR_VALID` state.

This makes the refresh performed by `run_add_p()` behave like the one
used by `git restore -p`: unchanged paths can be skipped via fsmonitor
instead of being scanned again.

In a 206k file repository with `core.fsmonitor` enabled and a one-line
change in one file, time to first prompt dropped from 34.774 seconds to
0.659 seconds. The new perf test file demonstrates similar improvements,
with maen times for without- and with-fsmonitor cases dropping from 6.90
and 6.83 seconds to 0.55 and 0.28 seconds, respectively.

Signed-off-by: Adam Johnson <me@adamj.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:43:22 +09:00
Derrick Stolee
2dc858e69e pack-objects: support sparse:oid filter with path-walk
The --filter=sparse:<oid> option to 'git pack-objects' allows focusing
an object set to a sparse-checkout definition. This reduces the set of
matching blobs while retaining all reachable trees. No server currently
supports fetching with this filter because it is expensive to compute
and reachability bitmaps do not help without a significant effort to
extend the bitmap feature to store bitmaps for each supported sparse-
checkout definition.

Without focusing on serving fetches and clones with these filters, there
are still benefits that could be realized by making this faster. With
the sparse index, it's more realistic now than ever to be able to
operate a local clone that was bootstrapped by a packfile created with
a sparse filter, because the missing trees are not needed to move a
sparse-checkout from one commit to another or to view the history of any
path in scope. Such clones could perhaps be bootstrapped by partial
bundles.

Previously, constructing these sparse packs has been incredibly
computationally inefficient. The revision walk that explores which
objects are in scope spends a lot of time checking each object to see if
it matches the sparse-checkout patterns, causing quadratic behavior
(number of objects times number of sparse-checkout patterns). This
improves somewhat when using cone-mode sparse-checkout patterns that can
use hashtables and prefix matches to determine containment. However, the
check per object is still too expensive for most cases.

This is where the path-walk feature comes in. We can proceed as normal
by placing objects in bins by path and _then_ check a group of objects
all at once. Since sparse:<oid> only restricts blobs, the path-walk must
include all reachable trees while using the cone-mode patterns to skip
blobs at paths outside the sparse scope. This establishes a baseline for
a potential future "treesparse:<oid>" filter that would also restrict
trees, but introducing such a new filter is deferred to a later change.

The implementation here is focused around loading the sparse-checkout
patterns from the provided object ID and checking that the patterns are
indeed cone-mode patterns. We can then load the correct pattern list
into the path walk context and use the logic that already exists from
bff4555767 (backfill: add --sparse option, 2025-02-03), though that
feature loads sparse-checkout patterns from the worktree's local
settings and also restricts tree objects. We use a combination of errors
and warnings to signal problems during this load. The difference is that
errors are likely fatal for the non-path-walk version while the warnings
are probably just implementation details for the path-walk version and
the 'git pack-objects' command can fall back to the revision walk
version.

Now that the SEEN flag is deferred until after pattern checks (from the
previous commit), handle the case where a tree with a shared OID appears
at both an out-of-cone and in-cone path. When trees are not being pruned
(pl_sparse_trees == 0), the path-walk re-walks the tree at the in-cone
path so that in-cone blobs within it are discovered. The new tests in
t5317 and t6601 demonstrate this behavior and would fail without these
changes.

The performance test p5315 shows the impact of this change when using
sparse filters:

Test                                              HEAD~1     HEAD
----------------------------------------------------------------------
5315.10: repack (sparse:oid)                      77.98    77.47  -0.7%
5315.11: repack size (sparse:oid)                187.5M   187.4M  -0.0%
5315.12: repack (sparse:oid, --path-walk)         77.91    31.41 -59.7%
5315.13: repack size (sparse:oid, --path-walk)   187.5M   161.1M -14.1%

These performance tests were run on the Git repository. The --path-walk
feature shows meaningful space savings (14% smaller for sparse packs)
and dramatic time savings (60% faster) by leveraging the path-walk's
ability to skip blobs outside the sparse scope.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blaue <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:41:06 +09:00
Derrick Stolee
8ff8de7616 path-walk: add pl_sparse_trees to control tree pruning
The path-walk API prunes trees and blobs when a sparse-checkout pattern
list is provided, which is the correct behavior for 'git backfill
--sparse' since it only needs to fill in objects at paths within the
sparse cone.

However, a future change will use the path-walk API with a sparse:<oid>
filter that restricts only blobs while retaining all reachable trees.
To support both behaviors, add a 'pl_sparse_trees' flag to
path_walk_info. When set (as in 'git backfill --sparse' and the
--stdin-pl test helper mode), the sparse patterns prune both trees and
blobs. When unset, only blobs are filtered and all trees are walked and
reported.

Additionally, move the SEEN flag assignment in add_tree_entries() to
after the sparse pattern and pathspec checks. Previously, SEEN was set
immediately upon discovering an object, before checking whether its path
matched the sparse patterns. When the same object ID appeared at
multiple paths (e.g. sibling directories with identical contents), the
first path to be visited would mark the object as SEEN. If that path was
outside the sparse cone, the object would be skipped there but also
never discovered at its in-cone path.

By deferring the SEEN flag until after the checks pass, objects that are
skipped due to sparse filtering remain discoverable at other paths where
they may be in scope.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:41:06 +09:00
Derrick Stolee
f1b5d3da16 path-walk: support blob size limit filter
Extend the path-walk API to handle the 'blob:limit=<size>' object
filter natively. This filter omits blobs whose size is equal to or
greater than the given limit, matching the semantics used by the
list-objects-filter machinery.

When revs->filter.choice is LOFC_BLOB_LIMIT, the prepare_filters()
method stores the limit value in info->blob_limit and clears the filter
from revs. If the limit is zero, this degenerates to blob:none (all
blobs excluded), so info->blobs is set to 0 instead.

During walk_path(), blob batches are filtered before being delivered to
the callback: each blob's size is checked via odb_read_object_info(),
and only blobs strictly smaller than the limit are included. Blobs whose
size cannot be determined (e.g. missing in a partial clone) are
conservatively included, matching the existing filter behavior. Empty
batches after filtering are skipped entirely.

The check for inclusion in the path batch looks a little strange at
first glance. We use odb_read_object_info() to read the object's size.
Based on all of the assumptions to this point, this _should_ return
OBJ_BLOB. Since we are focused on the size filter, we use a
short-circuited OR (||) to skip the size check if that method returns a
different object type.

Notice that this inspection of object sizes requires the content to be
present in the repository. The odb_read_object_info() call will download
a missing blob on-demand. This means that the use of the path-walk API
within 'git backfill' would not operate nicely with this filter type.
The intention of that command is to download missing blobs in batches.
Downloading objects one-by-one would go against the point. Update the
validation in 'git backfill' to add its own compatibility check on top
of path_walk_filter_compatible().

Add tests for blob:limit=0 (equivalent to blob:none) and blob:limit=3
(which exercises partial filtering within a batch where some blobs are
kept and others are excluded).

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:41:06 +09:00
Derrick Stolee
bf24de4b7c backfill: die on incompatible filter options
The 'git backfill' command uses the path-walk API in a critical way: it
uses the objects output from the command to find the batches of missing
objects that should be requested from the server. Unlike 'git
pack-objects', we cannot fall back to another mechanism.

The previous change added the path_walk_filter_compatible() method that
we can reuse here. Use it during argument validation in cmd_backfill().

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:41:06 +09:00
Derrick Stolee
6d87f0e8a3 path-walk: support blobless filter
The 'git pack-objects' command can opt-in to using the path-walk API for
scanning the objects. Currently, this option is dynamically disabled if
combined with '--filter=<X>', even when using a simple filter such as
'blob:none' to signal a blobless packfile. This is a common scenario for
repos at scale, so is worth integrating.

Also, users can opt-in to the '--path-walk' option by default through
the pack.usePathWalk=true config option. When using that in a blobless
partial clone, the following warning can appear even though the user did
not specify either option directly:

  warning: cannot use --filter with --path-walk

Teach the path-walk API to handle the 'blob:none' object filter
natively. When revs->filter.choice is LOFC_BLOB_NONE, the path-walk
sets info->blobs to 0 (skipping all blob objects) and clears the
filter from revs so that prepare_revision_walk() does not reject the
configuration.

This check is implemented in the static prepare_filters() method, which
will simultaneously check if the input filters are compatible and will
make the appropriate mutations to the path_walk_info and filters if the
path_walk_info is non-NULL. This allows us to use this logic both in the
API method path_walk_filter_compatible() for use in
builtin/pack-objects.c and as a prep step in walk_objects_by_path().

Update the test helper (test-path-walk) to accept --filter=<spec>
as a test-tool option (before '--'), applying it to revs after
setup_revisions() to avoid the --objects requirement check. We can also
revert recent GIT_TEST_PACK_PATH_WALK overrides in t5620.

Also switch test-path-walk from REV_INFO_INIT with manual repo
assignment to repo_init_revisions(), which properly initializes
the filter_spec strbuf needed for filter parsing.

Add tests for blob:none with --all and with a single branch.

The performance test p5315 shows the impact of this change when using
blobless filters:

Test                                           HEAD~1     HEAD
---------------------------------------------------------------------
5315.6: repack (blob:none)                      13.53   13.87  +2.5%
5315.7: repack size (blob:none)                137.7M  137.8M  +0.1%
5315.8: repack (blob:none, --path-walk)         13.51   23.43 +73.4%
5315.9: repack size (blob:none, --path-walk)   137.7M  115.2M -16.3%

These performance tests were run on the Git repository. The --path-walk
feature shows meaningful space savings (16% smaller for blobless packs)
at the cost of increased computation time due to the two compression
passes. This data demonstrates that the feature is engaged and provides
real compression benefits when --no-reuse-delta forces fresh deltas.

Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:41:06 +09:00
Derrick Stolee
35567889ef pack-objects: pass --objects with --path-walk
When 'git pack-objects' has the --path-walk option enabled, it uses a
different set of revision walk parameters than normal. For one,
--objects was previously assumed by the path-walk API and could be
omitted. We also needed --boundary to allow discovering UNINTERESTING
objects to use as delta bases.

We will be updating the path-walk API soon to work with some filter
options. However, the revision machinery will trigger a fatal error:

  fatal: object filtering requires --objects

The fix is easy: add the --objects option as an argument. This has no
effect on the path-walk API but does simplify the revision option
parsing for the objects filter.

We can remove the comment about "removing" the options because they were
never removed and instead not added. We still need to disable using
bitmaps.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-24 18:41:06 +09:00
Junio C Hamano
9ebc19b760 Merge branch 'ps/odb-in-memory' into ps/odb-source-loose
* ps/odb-in-memory: (24 commits)
  t/unit-tests: add tests for the in-memory object source
  odb: generic in-memory source
  odb/source-inmemory: stub out remaining functions
  odb/source-inmemory: implement `freshen_object()` callback
  odb/source-inmemory: implement `count_objects()` callback
  odb/source-inmemory: implement `find_abbrev_len()` callback
  odb/source-inmemory: implement `for_each_object()` callback
  odb/source-inmemory: convert to use oidtree
  oidtree: add ability to store data
  cbtree: allow using arbitrary wrapper structures for nodes
  odb/source-inmemory: implement `write_object_stream()` callback
  odb/source-inmemory: implement `write_object()` callback
  odb/source-inmemory: implement `read_object_stream()` callback
  odb/source-inmemory: implement `read_object_info()` callback
  odb: fix unnecessary call to `find_cached_object()`
  odb/source-inmemory: implement `free()` callback
  odb: introduce "in-memory" source
  odb/transaction: make `write_object_stream()` pluggable
  object-file: generalize packfile writes to use odb_write_stream
  object-file: avoid fd seekback by checking object size upfront
  ...
2026-05-21 22:34:55 +09:00
Junio C Hamano
686213114e Merge branch 'mm/git-url-parse'
The internal URL parsing logic has been made accessible via a new
subcommand "git url-parse".

* mm/git-url-parse:
  t9904: add tests for the new url-parse builtin
  doc: describe the url-parse builtin
  builtin: create url-parse command
  urlmatch: define url_parse function
  url: return URL_SCHEME_UNKNOWN instead of dying
  url: move scheme detection to URL header/source
  url: move url_is_local_not_ssh to url.h
  connect: rename enum protocol to url_scheme
2026-05-21 12:06:48 +09:00
Junio C Hamano
2a098fd2f6 Merge branch 'kn/refs-generic-helpers'
Refactor service routines in the ref subsystem backends.

* kn/refs-generic-helpers:
  refs: use peeled tag values in reference backends
  refs: add peeled object ID to the `ref_update` struct
  refs: move object parsing to the generic layer
  update-ref: handle rejections while adding updates
  update-ref: move `print_rejected_refs()` up
  refs: return `ref_transaction_error` from `ref_transaction_update()`
  refs: extract out reflog config to generic layer
  refs: introduce `ref_store_init_options`
  refs: remove unused typedef 'ref_transaction_commit_fn'
2026-05-21 12:06:47 +09:00
Derrick Stolee
6f37fecfed remote: add remote.*.negotiationInclude config
Add a new 'remote.<name>.negotiationInclude' multi-valued config option that
provides default values for --negotiation-include when no
--negotiation-include arguments are specified over the command line.  This
is a mirror of how 'remote.<name>.negotiationRestrict' specifies defaults
for the --negotiation-restrict arguments.

Each value is either an exact ref name or a glob pattern whose tips should
always be sent as 'have' lines during negotiation. The config values are
resolved through the same resolve_negotiation_include() codepath as the CLI
options.

This option is additive with the normal negotiation process: the negotiation
algorithm still runs and advertises its own selected commits, but the refs
matching the config are sent unconditionally on top of those heuristically
selected commits.

Similar to the negotiationRestrict config, an empty value resets the value
list to allow ignoring earlier config values, such as those that might be
set in system or global config.

Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:33:24 +09:00
Derrick Stolee
e2164742c9 fetch: add --negotiation-include option for negotiation
Add a new --negotiation-include option to 'git fetch', which ensures
that certain ref tips are always sent as 'have' lines during fetch
negotiation, regardless of what the negotiation algorithm selects.

This is useful when the repository has a large number of references, so
the normal negotiation algorithm truncates the list. This is especially
important in repositories with long parallel commit histories. For
example, a repo could have a 'dev' branch for development and a
'release' branch for released versions. If the 'dev' branch isn't
selected for negotiation, then it's not a big deal because there are
many in-progress development branches with a shared history. However, if
'release' is not selected for negotiation, then the server may think
that this is the first time the client has asked for that reference,
causing a full download of its parallel commit history (and any extra
data that may be unique to that branch). This is based on a real example
where certain fetches would grow to 60+ GB when a release branch
updated.

This option is a complement to --negotiation-restrict, which reduces the
negotiation ref set to a specific list. In the earlier example, using
--negotiation-restrict to focus the negotiation to 'dev' and 'release'
would avoid those problematic downloads, but would still not allow
advertising potentially-relevant user branches. In this way, the
'include' version solves the problem I mention while allowing
negotiation to pick other references opportunistically. The two options
can also be combined to allow the best of both worlds.

The argument may be an exact ref name or a glob pattern. Non-existent
refs are silently ignored. This behavior is also updated in the ref matching
logic for the related --negotiation-restrict option to match.

The implementation outputs the requested objects as haves before the
negotiator performs its own algorithm to choose the next haves. Use the new
have_sent() interface to signal these have commits were sent before engaging
with the negotiator's next() iterator.

Also add --negotiation-include to 'git pull' passthrough options.

Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:33:24 +09:00
Derrick Stolee
8bb252f86c remote: add remote.*.negotiationRestrict config
In a previous change, the --negotiation-restrict command-line option of 'git
fetch' was added as a synonym of --negotiation-tip. Both of these options
restrict the set of 'haves' the client can send as part of negotiation.

This was previously not available via a configuration option. Add a new
'remote.<name>.negotiationRestrict' multi-valued config option that updates
'git fetch <name>' to use these restrictions by default.

If the user provides even one --negotiation-restrict argument, then the
config is ignored.

An empty value resets the value list to allow ignoring earlier config
values, such as those that might be set in system or global config.

Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:33:24 +09:00
Derrick Stolee
4aef7dbb06 transport: rename negotiation_tips
The previous change added the --negotiation-restrict synonym for the
--negotiation-tip option for 'git fetch'. In anticipation of adding a new
option that behaves similarly but with distinct changes to its behavior,
rename the internal representation of this data from 'negotiation_tips' to
'negotiation_restrict_tips'.

The 'tips' part is kept because this is an oid_array in the transport layer.
This requires the builtin to handle parsing refs into collections of oids so
the transport layer can handle this cleaner form of the data.

Also update the string_list used to store the inputs from command-line
options.

Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:33:23 +09:00
Derrick Stolee
1a445fc60b fetch: add --negotiation-restrict option
The --negotiation-tip option to 'git fetch' and 'git pull' allows users
to specify that they want to focus negotiation on a small set of
references. This is a _restriction_ on the negotiation set, helping to
focus the negotiation when the ref count is high. However, it doesn't
allow for the ability to opportunistically select references beyond that
list.

This subtle detail that this is a 'maximum set' and not a 'minimum set'
is not immediately clear from the option name. This makes it more
complicated to add a new option that provides the complementary behavior
of a minimum set.

For now, create a new synonym option, --negotiation-restrict, that
behaves identically to --negotiation-tip. Update the documentation to
make it clear that this new name is the preferred option, but we keep
the old name for compatibility. Mark --negotiation-tip as an alias of the
new, preferred option.

Update a few warning messages with the new option, but also make them
translatable with the option name inserted by formatting. At least one
of these messages will be reused later for a new option.

Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:33:23 +09:00
Taylor Blau
06733a50ee repack: allow --write-midx=incremental without --geometric
Previously, `--write-midx=incremental` required `--geometric` and would
die() without it. Relax this restriction so that incremental MIDX
repacking can be used independently.

Without `--geometric`, the behavior is append-only: a single new MIDX
layer is created containing whatever packs were written by the repack
and appended to the existing chain (or a new chain is started). Existing
layers are preserved as-is with no compaction or merging.

Implement this via a new repack_make_midx_append_plan() that builds a
plan consisting of a WRITE step for the freshly written packs followed
by COPY steps for every existing MIDX layer. The existing compaction
plan (repack_make_midx_compaction_plan) is used only when `--geometric`
is active.

Update the documentation to describe the behavior with and without
`--geometric`, and replace the test that enforced the old restriction
with one exercising append-only incremental MIDX repacking.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:31:14 +09:00
Taylor Blau
938af89260 repack: introduce --write-midx=incremental
Expose the incremental MIDX repacking mode (implemented in an earlier
commit) via a new --write-midx=incremental option for `git repack`.

Add "incremental" as a recognized argument to the --write-midx
OPT_CALLBACK, mapping it to REPACK_WRITE_MIDX_INCREMENTAL. When this
mode is active and --geometric is in use, set the midx_layer_threshold
on the pack geometry so that only packs in sufficiently large tip layers
are considered for repacking.

Two new configuration options control the compaction behavior:

 - repack.midxSplitFactor (default: 2): the factor used in the
   geometric merging condition for MIDX layers.

 - repack.midxNewLayerThreshold (default: 8): the minimum number of
   packs in the tip MIDX layer before its packs are considered as
   candidates for geometric repacking.

Add tests exercising the new mode across a variety of scenarios
including basic geometric violations, multi-round chain integrity,
branching and merging histories, cross-layer object uniqueness, and
threshold-based compaction.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:31:14 +09:00
Taylor Blau
1da62fb5c8 repack: implement incremental MIDX repacking
Implement the `write_midx_incremental()` function, which builds and
maintains an incremental MIDX chain as part of the geometric repacking
process.

Unlike the default mode which writes a single flat MIDX, the incremental
mode constructs a compaction plan that determines which MIDX layers to
write, compact, or copy, and then executes each step using `git
multi-pack-index` subcommands with the --no-write-chain-file flag.

The repacking strategy works as follows:

 * Acquire the lock guarding the multi-pack-index-chain.

 * A new MIDX layer is always written containing the newly created
   pack(s). If the tip MIDX layer was rewritten during geometric
   repacking, any surviving packs from that layer are also included.

 * Starting from the new layer, adjacent MIDX layers are merged together
   as long as the accumulated object count exceeds half the object count
   of the next deeper layer (controlled by 'repack.midxSplitFactor').

 * Remaining layers in the chain are evaluated pairwise and either
   compacted or copied as-is, following the same merging condition.

 * Write the contents of the new multi-pack-index chain, atomically move
   it into place, and then release the lock.

 * Delete any now-unused MIDX layers.

After writing the new layer, the strategy is evaluated among the
existing MIDX layers in order from oldest to newest. Each step that
writes a new MIDX layer uses "--no-write-chain-file" to avoid updating
the multi-pack-index-chain file. After all steps are complete, the new
chain file is written and then atomically moved into place.

At present, this functionality is exposed behind a new enum value,
`REPACK_WRITE_MIDX_INCREMENTAL`, but has no external callers. A
subsequent commit will expose this mode via `git repack
--write-midx=incremental`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:31:14 +09:00
Taylor Blau
d376967fbf builtin/repack.c: convert --write-midx to an OPT_CALLBACK
Change the --write-midx (-m) flag from an OPT_BOOL to an OPT_CALLBACK
that accepts an optional mode argument. Introduce an enum with
REPACK_WRITE_MIDX_NONE and REPACK_WRITE_MIDX_DEFAULT to distinguish
between the two states, and update all existing boolean checks
accordingly.

For now, passing no argument (or just `-m`) selects the default mode,
preserving existing behavior. A subsequent commit will add a new mode
for writing incremental MIDXs.

Extract repack_write_midx() as a dispatcher that selects the
appropriate MIDX-writing implementation based on the mode.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:31:14 +09:00
Taylor Blau
f0ef2afb8b repack: track the ODB source via existing_packs
Store the ODB source in the `existing_packs` struct and use that in
place of the raw `repo->objects->sources` access within `cmd_repack()`.

The source used is still assigned from the first source in the list, so
there are no functional changes in this commit. The changes instead
serve two purposes (one immediate, one not):

 - The incremental MIDX-based repacking machinery will need to know what
   source is being used to read the existing MIDX/chain (should one
   exist).

 - In the future, if "git repack" is taught how to operate on other
   object sources, this field will serve as the authoritative value for
   that source.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:31:13 +09:00
Taylor Blau
0cd2255e64 midx: support custom --base for incremental MIDX writes
Both `compact` and `write --incremental` fix the base of the resulting
MIDX layer: `compact` always places the compacted result on top of
"from's" immediate parent in the chain, and `write --incremental` always
appends a new layer to the existing tip. In both cases the base is not
configurable.

Future callers need additional flexibility. For instance, the incremental
MIDX-based repacking code may wish to write a layer based on some
intermediate ancestor rather than the current tip, or produce a root
layer when replacing the bottommost entries in the chain.

Introduce a new `--base` option for both subcommands to specify the
checksum of the MIDX layer to use as the base. The given checksum must
refer to a valid layer in the MIDX chain that is an ancestor of the
topmost layer being written or compacted.

The special value "none" is accepted to produce a root layer with no
parent. This will be needed when the incremental repacking machinery
determines that the bottommost layers of the chain should be replaced.

If no `--base` is given, behavior is unchanged: `compact` uses "from's"
immediate parent in the chain, and `write` appends to the existing tip.

For the `write` subcommand, `--base` requires `--no-write-chain-file`. A plain
`write --incremental` appends a new layer to the live chain tip with no
mechanism to atomically replace it; overriding the base would produce a
layer that does not extend the tip, breaking chain invariants. With
`--no-write-chain-file` the chain is left unmodified and the caller is
responsible for assembling a valid chain.

For `compact`, no such restriction applies. The compaction operation
atomically replaces the compacted range in the chain file, so writing
the result on top of any valid ancestor preserves chain invariants.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:31:13 +09:00
Taylor Blau
8d342ed4b5 midx: introduce --no-write-chain-file for incremental MIDX writes
When writing an incremental MIDX layer, the MIDX machinery writes the
new layer into the multi-pack-index.d directory and then updates the
multi-pack-index-chain file to include the freshly written layer.

Future callers however may not wish to immediately update the MIDX chain
itself, preferring instead to write out new layer(s) themselves before
atomically updating the chain. Concretely, the new incremental
MIDX-based repacking strategy will want to do exactly this (that is,
assemble the new MIDX chain itself before writing a new chain file and
atomically linking it into place).

Introduce a `--no-write-chain-file` flag that:

 * writes the new MIDX layer into the multi-pack-index.d directory

 * prints its checksum

 * does not update the multi-pack-index-chain file.

The MIDX chain file (and thus, the lock protecting it) remain untouched,
allowing callers to assemble the chain themselves. This flag requires
`--incremental`, since the notion of a separate layer only makes sense
for incremental MIDXs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-20 11:31:13 +09:00
Junio C Hamano
f5fc0f53de Merge branch 'sb/unpack-index-pack-buffer-resize'
Use a larger buffer size in the code paths to ingest pack stream.

* sb/unpack-index-pack-buffer-resize:
  index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
2026-05-20 10:30:58 +09:00
Junio C Hamano
ca7d7d6424 Merge branch 'ps/history-fixup'
"git history" learned "fixup" command.

* ps/history-fixup:
  builtin/history: introduce "fixup" subcommand
  builtin/history: generalize function to commit trees
  replay: allow callers to control what happens with empty commits
2026-05-20 10:30:57 +09:00