The experimental "git history" command has been taught a new
"squash" subcommand to fold a range of commits into a single commit,
replaying any descendants on top.
* hn/history-squash:
history: re-edit a squash with every message
history: add squash subcommand to fold a range
history: give commit_tree_ext a message template
history: extract helper for a commit's parent tree
A new `diff.<driver>.process` configuration has been introduced to
allow a long-running external process to act as a hunk provider to
allows external tools to control which lines Git considers changed
while leaving all output formatting (word diff, color, blame, etc.) to
Git's standard pipeline.
* mm/diff-process-hunks:
blame: consult diff process for no-hunk detection
diff: bypass diff process with --no-ext-diff and in format-patch
diff: add long-running diff process via diff.<driver>.process
sub-process: separate process lifecycle from hashmap management
userdiff: add diff.<driver>.process config
xdiff: support external hunks via xpparam_t
The `git multi-pack-index write --incremental` command has been
corrected to properly honor the `--base` option. Previously, the
custom base was ignored by the normal write path, and the pack
exclusion logic incorrectly skipped packs from layers above the
selected base, breaking reachability closure for bitmaps.
* tb/midx-incremental-custom-base:
midx-write: include packs above custom incremental base
midx: pass custom '--base' through incremental writes
t5334: expose shared `nth_line()` helper
git replay learns --linearize option to drop merge commits and
linearize the replayed history, mimicking git rebase
--no-rebase-merges.
* tc/replay-linearize:
replay: offer an option to linearize the commit topology
replay: add helper to put entry into mapped_commits
replay: refactor enum replay_mode into a bool
"git branch" command learned "--delete-merged" option to remove
local branches that have already been merged to the remote-tracking
branches they track.
* hn/branch-delete-merged:
branch: add --dry-run for --delete-merged
branch: add branch.<name>.deleteMerged opt-out
branch: add --delete-merged <branch>
branch: prepare delete_branches for a bulk caller
branch: let delete_branches skip unmerged branches on bulk refusal
branch: convert delete_branches() to a flags argument
branch: add --forked filter for --list mode
The lazy priority queue optimization pattern (deferring actual removal
in prio_queue_get() to allow get+put fusion) has been folded directly
into prio_queue itself, speeding up commit traversal workflows and
simplifying callers.
* kk/prio-queue-get-put-fusion:
prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
prio-queue: rename .nr to .nr_ and add accessor helpers
The `remote-object-info` command has been added to `git cat-file
--batch-command`, allowing clients to request object metadata
(currently size) from a remote server via protocol v2 without
downloading the entire object.
The client dynamically filters format placeholders based on
server-advertised capabilities and safely returns empty strings for
inapplicable or unsupported fields.
* ps/cat-file-remote-object-info:
cat-file: make remote-object-info allow-list dynamic
cat-file: validate remote atoms with allow_list
cat-file: add remote-object-info to batch-command
transport: add client support for object-info
serve: advertise object-info feature
fetch-pack: move fetch initialization
connect: refactor packet writing
fetch-pack: move function to connect.c
fetch-pack: prepare function to be moved
t1006: split test utility functions into new "lib-cat-file.sh"
cat-file: declare loop counter inside for()
git-compat-util: add strtoul_szt() with error handling
transport-helper: fix memory leak of helper on disconnect
The experimental "git history" command has been taught a new "drop"
subcommand to remove a commit and replay its descendants onto its
parent.
* ps/history-drop:
builtin/history: implement "drop" subcommand
builtin/history: split handling of ref updates into two phases
reset: stop assuming that the caller passes in a clean index
reset: allow the caller to specify the current HEAD object
reset: introduce ability to skip updating HEAD
reset: introduce dry-run mode
reset: modernize flags passed to `reset_working_tree()`
reset: rename `reset_head()`
reset: drop `USE_THE_REPOSITORY_VARIABLE`
read-cache: split out function to drop unmerged entries to stage 0
The -m/-F/-c/-C options to supply commit log message from outside the
editor are now supported for all "git commit --fixup" variations.
* ec/commit-fixup-options:
commit: allow -c/-C for all kinds of --fixup
commit: allow -m/-F for all kinds of --fixup
"git checkout --track=..." learned to optionally fetch the branch
from the remote the new branch will work with.
* hn/checkout-track-fetch:
checkout: extend --track with a "fetch" mode to refresh start-point
branch: expose helpers for finding the remote owning a tracking ref
The parse-options library learned to auto-correct misspelled
subcommand names.
* js/parseopt-subcommand-autocorrection:
SQUASH???
doc: document autocorrect API
parseopt: add tests for subcommand autocorrection
parseopt: enable subcommand autocorrection for git-remote and git-notes
parseopt: autocorrect mistyped subcommands
autocorrect: provide config resolution API
autocorrect: rename AUTOCORRECT_SHOW to AUTOCORRECT_HINT
autocorrect: use mode and delay instead of magic numbers
help: move tty check for autocorrection to autocorrect.c
help: make autocorrect handling reusable
parseopt: extract subcommand handling from parse_options_step()
The whence field in struct object_info has been removed,
refactoring backend-specific object information retrieval into an
opt-in struct object_info_source structure.
* ps/odb-drop-whence:
odb: document object info fields
odb: drop `whence` field from object info
treewide: convert users of `whence` to the new source field
odb: add `source` field to struct object_info_source
odb: make backend-specific fields optional
packfile: thread odb_source_packed through packed_object_info()
The global configuration variable ignore_case (representing the
core.ignorecase configuration) has been migrated into struct
repo_config_values to tie it to a specific repository instance.
* ty/migrate-ignorecase:
config: use repo_ignore_case() to access core.ignorecase
environment: move ignore_case into repo_config_values
The "git refs" toolbox has been extended with new "create", "delete",
"update", and "rename" subcommands to create, delete, update, and
rename references, respectively.
* ps/refs-writing-subcommands:
builtin/refs: add "rename" subcommand
builtin/refs: add "create" subcommand
builtin/refs: add "update" subcommand
builtin/refs: add "delete" subcommand
builtin/refs: drop `the_repository`
The reference backends have been converted to always use absolute
paths internally. This allows dropping the calls to
`chdir_notify_reparent()` and fixes a memory leak in how the
reference database is constructed with an "onbranch" condition.
* ps/refs-avoid-chdir-notify-reparent:
refs: protect against chicken-and-egg recursion
refs/reftable: lazy-load configuration to fix chicken-and-egg
reftable: split up write options
refs/files: lazy-load configuration to fix chicken-and-egg
refs: move parsing of "core.logAllRefUpdates" back into ref stores
repository: free main reference database
chdir-notify: drop unused `chdir_notify_reparent()`
refs: unregister reference stores from "chdir_notify"
setup: don't apply "GIT_REFERENCE_BACKEND" without a repository
setup: stop applying repository format twice
setup: inline `check_and_apply_repository_format()`
The "git repo info" command has been taught new keys to output both
absolute and relative paths for "gitdir" and "commondir", supported by
a new path-formatting helper extracted from "git rev-parse".
* jk/repo-info-path-keys:
repo: add path.gitdir with absolute and relative suffix formatting
repo: add path.commondir with absolute and relative suffix formatting
path: extract format_path() and use in rev-parse
The pack-objects command now supports using reachability bitmaps and
delta-islands concurrently with the `--path-walk` option, allowing
faster packaging by falling back to path-walk when bitmaps cannot
fully satisfy the request.
* tb/pack-path-walk-bitmap-delta-islands:
pack-objects: support `--delta-islands` with `--path-walk`
pack-objects: extract `record_tree_depth()` helper
pack-objects: support reachability bitmaps with `--path-walk`
t/perf: drop p5311's lookup-table permutation
The `fetch.followRemoteHEAD` configuration variable has been added to
provide a default for the per-remote `remote.<name>.followRemoteHEAD`
setting.
* mh/fetch-follow-remote-head-config:
fetch: fixup a misaligned comment
fetch: add configuration variable fetch.followRemoteHEAD
fetch: refactor do_fetch handling of followRemoteHEAD
fetch: return 0 on known git_fetch_config
fetch: rename function report_set_head
t5510: cleanup remote in followRemoteHEAD dangling ref test
doc: explain fetchRemoteHEADWarn advice
fetch: fixup set_head advice for warn-if-not-branch
The packed object source has been refactored into a proper struct
odb_source.
* ps/odb-source-packed:
odb/source-packed: drop pointer to "files" parent source
midx: refactor interfaces to work on "packed" source
odb/source-packed: stub out remaining functions
odb/source-packed: wire up `freshen_object()` callback
odb/source-packed: wire up `find_abbrev_len()` callback
odb/source-packed: wire up `count_objects()` callback
odb/source-packed: wire up `for_each_object()` callback
odb/source-packed: wire up `read_object_stream()` callback
odb/source-packed: wire up `read_object_info()` callback
packfile: use higher-level interface to implement `has_object_pack()`
odb/source-packed: wire up `reprepare()` callback
odb/source-packed: wire up `close()` callback
odb/source-packed: start converting to a proper `struct odb_source`
odb/source-packed: store pointer to "files" instead of generic source
packfile: move packed source into "odb/" subsystem
packfile: split out packfile list logic
packfile: rename `struct packfile_store` to `odb_source_packed`
Continuation of "setup.c" refactoring to drop remaining global state
(`git_work_tree_cfg`, `is_bare_repository_cfg`). The most notable
outcome is that `is_bare_repository()` has been updated to no longer
implicitly rely on `the_repository`.
* ps/setup-drop-global-state:
treewide: drop USE_THE_REPOSITORY_VARIABLE
environment: stop using `the_repository` in `is_bare_repository()`
environment: split up concerns of `is_bare_repository_cfg`
builtin/init: stop modifying `is_bare_repository_cfg`
setup: remove global `git_work_tree_cfg` variable
builtin/init: simplify logic to configure worktree
builtin/init: stop modifying global `git_work_tree_cfg` variable
The static allow-list in expand_atom() is hardcoded to only allow
"objectname" and "objectsize" for remote queries. This works because
up to this point all servers will either support object-info with name
and size or they do not support them at all, but we cannot expect that
in a future different servers with different git versions to have the
same object-info capabilities. Therefore, the allow_list needs to be
dynamic depending on what the server advertises.
The client will now:
1. Request the protocol option that the placeholder refers to (i.e.
"size" when "%(objectsize)").
2. Filters the request in fetch_object_info() dropping any option that
the server does not advertise.
3. After the fetching, the options that haven't been dropped are the ones
fetched and supported by the server, these supported options are
mapped and remote_allowed_atoms is populated with the placeholders.
4. expand_atom() checks remote_allowed_atoms with the same behaviour as
the static allow_list had.
Move object_info_options out of get_remote_info so the caller which has
data can select what options will be requested instead of requesting
always size.
Move batch_object_write() out so there will always be an output even if
all the placeholders are not supported by the server (returns an empty
line).
Include "type" in the object_info_options so once the server supports
it, the clients know already how to request it.
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strstr() is not enough to validate the format placeholders in
remote-object-info causing two errors:
- Atoms recognized by expand_atom() but the remote doesn't returns 1, but
data->type contains garbage causing segfault.
- expand_atom() returns 0 for unknown atoms, calling
strbuf_expand_bad_format() which ends in die() blocking local queries
if the same format is shared.
Add an allow_list with the supported atoms at the top of expand_atom().
In remote mode, unsupported atoms return 1 leaving the sb empty,
honoring how for-each-ref handles known but inapplicable atoms.
As extra safety, initialize data->type to OBJ_BAD and add a NULL check
for type_name() so uninitialized data doesn't cause segfault.
Update tests that expect previous die() behaviour to expect an empty
string and add an explicit test for empty string return on unknown
placeholder.
Update caveat behaviour documentation.
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the `info` command in `cat-file --batch-command` prints object
info for a given object, it is natural to add another command in
`cat-file --batch-command` to print object info for a given object
from a remote.
Add `remote-object-info` to `cat-file --batch-command`.
While `info` takes object ids one at a time, this creates
overhead when making requests to a server. So `remote-object-info`
instead can take multiple object ids at once.
The `cat-file --batch-command` command is generally implemented in
the following manner:
- Receive and parse input from user
- Call respective function attached to command
- Get object info, print object info
In --buffer mode, this changes to:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue
- Call respective function attached to command
- Get object info, print object info
Notice how the getting and printing of object info is accomplished one
at a time. As described above, this creates a problem for making
requests to a server. Therefore, `remote-object-info` is implemented in
the following manner:
- Receive and parse input from user
If command is `remote-object-info`:
- Get object info from remote
- Loop through and print each object info
Else:
- Call respective function attached to command
- Parse input, get object info, print object info
And finally for --buffer mode `remote-object-info`:
- Receive and parse input from user
- Store respective function attached to command in a queue
- After flush, loop through commands in queue:
If command is `remote-object-info`:
- Get object info from remote
- Loop through and print each object info
Else:
- Call respective function attached to command
- Get object info, print object info
To summarize, `remote-object-info` gets object info from the remote and
then loops through the object info passed in, printing the info.
In order for `remote-object-info` to avoid remote communication
overhead in the non-buffer mode, the objects are passed in as such:
remote-object-info <remote> <oid> <oid> ... <oid>
rather than
remote-object-info <remote> <oid>
remote-object-info <remote> <oid>
...
remote-object-info <remote> <oid>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some code used in this series declares variable i and only uses it
in a for loop, not in any other logic outside the loop.
Change the declaration of i to be inside the for loop for readability.
While at it, we also change its type from "int" to "size_t" where the
latter makes more sense.
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cc42c88945 (refs: extract out reflog config to generic layer,
2026-05-04) we have refactored how we parse "core.logAllRefUpdates" so
that it happens in the generic layer. Unfortunately, this has worsened a
preexisting issue where we may recurse when creating the reference store
because of a chicken-and-egg problem between parsing the configuration
and evaluating "onbranch" conditions.
Prepare for a fix by essentially reverting that change so that we handle
this setting in the respective backends again. The backends are already
parsing other configuration anyway, so by moving the logic back in there
we can ensure that all backend configuration is parsed the same way.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With --dry-run, --delete-merged prints the local branches it would
delete, one "Would delete branch <name>" line each, and exits
without touching any ref. The same filtering applies, so the output
is exactly the set that the real run would delete.
--dry-run is only meaningful together with --delete-merged and is
rejected otherwise.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting branch.<name>.deleteMerged=false exempts that branch from
"git branch --delete-merged", which is useful for a topic you want
to keep developing after an early round of it has been merged
upstream. Unless --quiet is given, each skip is reported so the
user knows why their topic was kept.
Explicit deletion with "git branch -d" still uses the normal merge
check and ignores this setting.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git branch --delete-merged <branch>...
deletes the local branches that "--forked <branch>" would list,
keeping only those whose tip is reachable from their configured
upstream. The work has already landed on the upstream they track,
so the local copy is no longer needed.
A branch is not deleted when:
* it is checked out in any worktree
* its upstream remote-tracking branch no longer exists, since a
missing upstream is not by itself a sign of integration
* its push destination equals its upstream (<branch>@{push} is
the same as <branch>@{upstream}), such as a local "main" that
tracks and pushes to "origin/main". Right after a pull it just
looks "fully merged", so it is kept. Only branches that push
somewhere other than their upstream, typically topics in a fork
workflow, are candidates.
A branch whose work is not yet merged into its upstream is silently
skipped, so one unmerged topic does not abort the whole sweep.
A branch that another, surviving branch tracks as its upstream is
also kept, so a branch is never deleted out from under one stacked
on top of it. Such a kept branch is itself merged, so when its own
upstream is being deleted, clear its now-stale upstream config.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach delete_branches() two new modes for the upcoming
--delete-merged: one that asks only whether a branch is merged into
its upstream, without falling back to HEAD when there is no
upstream, and one that rehearses the deletions without removing any
ref. Existing callers keep their current behavior.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a skip-unmerged mode to delete_branches() and check_branch_commit()
so a bulk caller can silently skip branches that are not fully merged
and carry on, rather than erroring with the "use 'git branch -D'"
advice that the plain "git branch -d" path emits. Existing callers are
unaffected.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
delete_branches() and check_branch_commit() take a pair of int
booleans (force and quiet) that the next commits would grow further.
Replace them with a single "unsigned int flags" argument and an
enum, splitting the bits back into named bool locals so the body
keeps reading the same named values.
No change in behavior.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a --forked option to "git branch" list mode that lists only
branches whose configured upstream matches <branch>. The argument
can be a ref (e.g. "origin/main", "master"), a remote name like
"origin" for the branch its origin/HEAD points at, or a shell glob
(e.g. "origin/*"), and may be repeated to widen the filter.
It is an ordinary list filter, so it combines with the others:
git branch --merged origin/main --forked 'origin/*'
lists branches forked from origin that are already merged into
origin/main, and --no-merged inverts the question.
This is the building block for --delete-merged, which deletes the
listed branches once they have landed on their upstream.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default "git history squash" reuses the oldest commit's message.
When --reedit-message is given it only reopened that one message, so the
messages of the folded-in commits were lost.
Gather the messages of every commit in the range, oldest first, and use
them as the editor template when re-editing, mirroring how "git rebase
-i" presents a squash. The combined message is built before the
descendant walk so it is not disturbed by the flags that walk leaves on
the commits.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Folding a series of commits into one required either an interactive
rebase where each commit after the first was hand-edited to "fixup", or
a "git reset --soft" to the merge base followed by "git commit --amend".
Add "git history squash <revision-range>" to do this directly. It folds
every commit in the range into the oldest one, keeping that commit's
message and authorship and taking the tree of the newest commit, so the
range collapses into a single commit. Commits above the range are
replayed on top of the result.
The range is given as <base>..<tip>, so "git history squash @~3.."
folds the three most recent commits and "git history squash @~5..@~2"
squashes an interior range. A merge inside the range is folded like any
other commit, but the range must have a single base, so a range with
more than one entry point is rejected.
The folded commits leave the history, so by default the command refuses
when another ref points at one of them. Use "--update-refs=head" to
rewrite only the current branch and leave those refs untouched.
Inspired-by: Sergey Chernov <serega.morph@gmail.com>
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit_tree_ext() reuses the message of the commit it is handed. A
caller that folds several commits together wants to seed the message
from more than that single commit, so add an optional message_template
parameter. When NULL, the behavior is unchanged.
Pass NULL from the existing fixup and split callers.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three places resolve the tree of a commit's first parent, falling back
to the empty tree for a root commit, each repeating the same parse and
oidcpy dance. Extract a first_parent_tree_oid() helper and route the
existing callers through it.
No change in behavior.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Forking from an existing remote branch without refreshing first often
has consequences: you start work that has already been done, or you
build on an old version of the code which causes big conflicts later
when you pull. The workaround is two commands ("git fetch <remote>
<branch> && git checkout -b <topic> <remote>/<branch>"), and when
the fetch is skipped the checkout silently starts from a stale tip.
Users may already expect "<remote>/<branch>" to refer to the latest
tip on the remote. While this blurs the line between fetch and
checkout, git already does this in places where it pays off: "git
clone" fetches and checks out, and "git pull" fetches and merges.
Add a "fetch" mode to "--track" that refreshes <start-point> before
checking it out:
git checkout -b new_branch --track=fetch origin/some-branch
Only the requested branch is fetched so other remote-tracking
branches are left untouched. When <start-point> is a bare <remote>
(e.g. "origin"), follow refs/remotes/<remote>/HEAD to learn which
branch to refresh. If "git fetch" fails but the remote-tracking ref
already exists locally, warn and proceed from the existing tip,
otherwise abort.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `whence` field has become redundant now that callers can learn about
the exact source an object has been looked up from via the `struct
object_info_source::source` field.
Adapt callers to use the new field. Note that all callsites already set
up the `info.sourcep` request pointer, so the conversion is rather
straight-forward.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct object_info` carries two pieces of information
about how an object was looked up:
- The `whence` enum identifying the backend.
- The backend-tagged union `u` exposing backend-specific details
(currently only the packed-source case, which records the owning
pack, offset and packed object type).
The union is populated unconditionally, even though most callers don't
care about provenance at all.
Split the backend-specific union out into a new public type, `struct
object_info_source`, and make the object info structure carry it via
just another opt-in request pointer. As with all the other requestable
information, callers that need source info allocate a `struct
object_info_source` on the stack and point `sourcep` at it; callers that
don't care about it simply leave the field as a `NULL` pointer. Adapt
callers accordingly.
Note that the `whence` enum is strictly-speaking also backend-specific
information, so it would be another good candidate to be moved into the
`struct object_info_source`. For now though it is left alone, as it will
be replaced by a `struct odb_source` pointer in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an optional `struct odb_source_packed *source` parameter to
`packed_object_info()` and `packed_object_info_with_index_pos()`. This
parameter is unused at this point in time, but it will be used in a
follow-up commit so that we can record the source of a specific object.
Note that callers in "odb/source-packed.c" pass the already-available
source, but all other callers pass `NULL` instead. This is fine though,
as we only care about populating this info when called via the packed
store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/odb-source-packed:
odb/source-packed: drop pointer to "files" parent source
midx: refactor interfaces to work on "packed" source
odb/source-packed: stub out remaining functions
odb/source-packed: wire up `freshen_object()` callback
odb/source-packed: wire up `find_abbrev_len()` callback
odb/source-packed: wire up `count_objects()` callback
odb/source-packed: wire up `for_each_object()` callback
odb/source-packed: wire up `read_object_stream()` callback
odb/source-packed: wire up `read_object_info()` callback
packfile: use higher-level interface to implement `has_object_pack()`
odb/source-packed: wire up `reprepare()` callback
odb/source-packed: wire up `close()` callback
odb/source-packed: start converting to a proper `struct odb_source`
odb/source-packed: store pointer to "files" instead of generic source
packfile: move packed source into "odb/" subsystem
packfile: split out packfile list logic
packfile: rename `struct packfile_store` to `odb_source_packed`
Scripts need a stable way to locate the git directory without
parsing rev-parse output or relying on its flag-driven path format
selection. There is no way to retrieve this path from git repo info
today.
Introduce path.gitdir.absolute and path.gitdir.relative keys,
consistent with the path.commondir keys added in the previous patch.
Reuse the test_repo_info_path helper introduced there to validate
both variants.
Mentored-by: Justin Tobler <jltobler@gmail.com>
Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Scripts working with worktree setups need a reliable way to discover
the common directory, which diverges from the git directory when
multiple worktrees are in use. There is no way to retrieve this path
from git repo info today.
Introduce path.commondir.absolute and path.commondir.relative keys.
Exposing explicit format variants rather than a single key with a
default avoids ambiguity for scripts that require predictable output.
Mentored-by: Justin Tobler <jltobler@gmail.com>
Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Path formatting logic in builtin/rev-parse.c writes directly to
stdout. Other builtins cannot reuse it.
Extract this logic into format_path() in path.c and expose
a path_format enum in path.h.
Convert rev-parse to use the new helper in the same step to validate
the API against existing tests and avoid introducing dead code.
Mentored-by: Justin Tobler <jltobler@gmail.com>
Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the stated goals of git-replay(1) is to allow implementing the
git-rebase(1) functionality on the server side.
The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
was given. This mode drops merge commits instead of replaying them, and
linearizes the commit history into a sequence of the
regular (single-parent) commits.
Add option `--linearize` to git-replay(1) to do the same.
Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/objects-larger-than-4gb-on-windows-more:
odb: use size_t for object_info.sizep and the size APIs
packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
pack-objects: use size_t for in-core object sizes
packfile: widen unpack_entry()'s size out-parameter to size_t
pack-objects(check_pack_inflate()): use size_t instead of unsigned long
patch-delta: use size_t for sizes
compat/msvc: use _chsize_s for ftruncate
Since the inception of `--path-walk`, this option has had a documented
incompatibility with `--delta-islands`.
When discussing those original patches on the list, a message from
Stolee in [1] noted the following:
this could be remedied by [...] doing a separate walk to identify
islands using the normal method
In a related portion of the thread, Peff explains[2]:
The delta islands code already does its own tree walk to propagate
the bits down (it does rely on the base walk's show_commit() to
propagate through the commits).
Once each object has its island bitmaps, I think however you
choose to come up with delta candidates [...] you should be able
to use it. It's fundamentally just answering the question of "am
I allowed to delta between these two objects".
That is similar to what this patch does, and it turns out the cheaper
option is sufficient: perform the same island side effects from the
path-walk callback rather than doing a second walk.
Recall how delta-islands are computed during a normal repack:
- `show_commit()` calls `propagate_island_marks()` for each commit,
which merges the commit's island bitset onto its root tree object and
onto each of its parent commits.
- `show_object()` for a tree records the tree's depth derived from the
slash-separated pathname. Subsequent `resolve_tree_islands()` uses
that depth to walk trees in increasing-depth order, propagating each
tree's marks to its children.
- At delta-search time, `in_same_island()` enforces that a delta
target's island bitmap is a subset of its base's: every island that
reaches the target must also reach the base.
Path-walk's enumeration callback is `add_objects_by_path()`. It already
adds objects to `to_pack`, but until now did not perform the
island-related side effects. Two things are needed:
- For each commit batch, call `propagate_island_marks()` on commits,
exactly as `show_commit()` does.
We have to be careful about the order in which we call this function,
and we must see a commit before its parents in order to have
island marks to propagate.
The path-walk batch preserves that order. Path-walk appends commits
to its `OBJ_COMMIT` batch as they come back from the same
`get_revision()` loop the regular traversal uses, and
`add_objects_by_path()` iterates the batch in array order. So every
commit reaches `propagate_island_marks()` in the same sequence that
`show_commit()` would have seen it, and the descendant-first chain
that the algorithm relies on is intact.
Skip island propagation for excluded commits to match the regular
traversal, whose `show_commit()` callback is only invoked for
interesting commits. Boundary commits may still be present in
path-walk's callback so they can serve as thin-pack bases, but they
should not contribute island marks.
- For each tree batch, record the tree's depth from the path. Use the
`record_tree_depth()` helper from the previous commit so both
callbacks behave identically, including the max-depth-wins behavior
when a tree is reached via more than one path. The helper accepts
both the `show_object()` path shape ("foo", "foo/bar") and the
path-walk shape with a trailing slash ("foo/", "foo/bar/"), so depths
recorded from either traversal mode are directly comparable.
This is implicit in the implementation sketch from Peff above.
`resolve_tree_islands()` sorts trees by `oe->tree_depth` in
increasing-depth order before propagating marks down, so that a
parent tree's marks are finalized before its children inherit them.
Without recording the depth at path-walk time, every
path-walk-discovered tree would land at depth 0 in `to_pack`, the
sort would lose its ordering, and children could inherit marks from
parents whose own contributions had not yet been merged in.
With those two pieces in place, `resolve_tree_islands()` receives the
same island inputs from path-walk as it would from the regular
traversal, so the existing island checks can be reused unchanged.
Drop the documented incompatibility between `--path-walk` and
`--delta-islands`, and add t5320 coverage for path-walk island repacks
with and without bitmap writing, as well as the same-island case where a
delta remains allowed.
[1]: https://lore.kernel.org/git/9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com/
[2]: https://lore.kernel.org/git/20240911063203.GA1538586@coredump.intra.peff.net/
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare for a subsequent change that needs to record tree depths from a
second call site by factoring the delta-islands tree-depth bookkeeping
out of `show_object()` and into a helper, `record_tree_depth()`.
The helper looks up the object in `to_pack`, returns early when the
object was not added there, computes the depth from the slash count in
the supplied name, and preserves the existing max-depth-wins behavior
when a tree is reached by more than one path.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'pack-objects' is invoked with '--path-walk', it prevents us from
using reachability bitmaps.
This behavior dates back to 70664d2865 (pack-objects: add --path-walk
option, 2025-05-16), which included a comment in the relevant portion of
the command-line arguments handling that read as follows:
/*
* We must disable the bitmaps because we are removing
* the --objects / --objects-edge[-aggressive] options.
*/
In fb2c309b7d (pack-objects: pass --objects with --path-walk,
2026-05-02), path-walk learned to pass '--objects' again, but still
kept bitmap traversal disabled. That leaves two useful cases
unsupported:
* A path-walk repack that writes bitmaps does not give the bitmap
selector any commits, because path-walk reveals commits through
`add_objects_by_path()` rather than through `show_commit()`, where
`index_commit_for_bitmap()` is normally called.
* An invocation like "git pack-objects --use-bitmap-index --path-walk"
never tries an existing bitmap, even when one is available and could
answer the request.
Fortunately for us, neither restriction is required.
* On the writing side: teach the path-walk object callback to call
`index_commit_for_bitmap()` for commits that it adds to the pack.
That gives the bitmap selector the commit candidates it would have
seen from the regular traversal.
* For bitmap reading, keep passing '--objects' to the internal rev_list
machinery, but stop clearing `use_bitmap_index`. If an existing
bitmap can answer the request, use it; otherwise fall back to
path-walk's own enumeration.
As a result, we can see significantly reduced pack generation times from
p5311 (with our `GIT_PERF_REPO` set to a recent clone of the fluentui
repository) before this commit:
Test HEAD^ HEAD
----------------------------------------------------------------------------------------
5311.40: server (1 days, --path-walk) 1.43(1.39+0.04) 0.01(0.01+0.00) -99.3%
5311.41: size (1 days, --path-walk) 139.6K 139.7K +0.0%
5311.42: client (1 days, --path-walk) 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0%
5311.44: server (2 days, --path-walk) 1.43(1.39+0.04) 0.01(0.00+0.00) -99.3%
5311.45: size (2 days, --path-walk) 139.6K 139.7K +0.0%
5311.46: client (2 days, --path-walk) 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0%
5311.48: server (4 days, --path-walk) 1.44(1.39+0.04) 0.01(0.01+0.00) -99.3%
5311.49: size (4 days, --path-walk) 238.1K 238.1K +0.0%
5311.50: client (4 days, --path-walk) 0.03(0.03+0.00) 0.03(0.03+0.00) +0.0%
5311.52: server (8 days, --path-walk) 1.43(1.39+0.03) 0.01(0.00+0.00) -99.3%
5311.53: size (8 days, --path-walk) 344.9K 344.9K +0.0%
5311.54: client (8 days, --path-walk) 0.07(0.07+0.00) 0.07(0.08+0.00) +0.0%
5311.56: server (16 days, --path-walk) 1.47(1.44+0.03) 0.10(0.08+0.01) -93.2%
5311.57: size (16 days, --path-walk) 844.0K 844.0K +0.0%
5311.58: client (16 days, --path-walk) 0.09(0.09+0.00) 0.09(0.09+0.00) +0.0%
5311.60: server (32 days, --path-walk) 1.52(1.50+0.05) 0.14(0.15+0.02) -90.8%
5311.61: size (32 days, --path-walk) 4.2M 4.2M +0.1%
5311.62: client (32 days, --path-walk) 0.34(0.48+0.02) 0.34(0.45+0.05) +0.0%
5311.64: server (64 days, --path-walk) 1.55(1.52+0.06) 0.15(0.15+0.04) -90.3%
5311.65: size (64 days, --path-walk) 6.4M 6.4M -0.0%
5311.66: client (64 days, --path-walk) 0.51(0.79+0.05) 0.51(0.80+0.06) +0.0%
5311.68: server (128 days, --path-walk) 1.59(1.57+0.06) 0.16(0.21+0.01) -89.9%
5311.69: size (128 days, --path-walk) 8.4M 8.4M -0.0%
5311.70: client (128 days, --path-walk) 0.72(1.44+0.08) 0.71(1.47+0.09) -1.4%
We get the same size of output pack, but this commit allows us to do so
in a significantly shorter amount of time. Intuitively, we're generating
the same pack (hence the unchanged 'test_size' output from run to run),
but varying how we get there. Before this commit, pack-objects prefers
'--path-walk' to '--use-bitmap-index', so we generate the output pack by
performing a normal '--path-walk' traversal. With this commit, we are
operating over a *repacked* state (that itself was done with a
'--path-walk' traversal), but are able to perform pack-reuse on that
repacked state via bitmaps.
When comparing the size of the repacked pack with/without '--path-walk'
on the previous commit versus this one, we see that (a) the repacked size
improves significantly with '--path-walk', and that (b) writing bitmaps
during repacking does not regress this improvement:
Test HEAD^ HEAD
----------------------------------------------------------------------------------------
5311.3: size of bitmapped pack 558.4M 558.5M +0.0%
5311.38: size of bitmapped pack (--path-walk) 164.4M 164.4M +0.0%
(Note that to observe an improvement here, we must repack with '-F' in
order to avoid reusing non-'--path-walk' deltas, which would otherwise
skew our results.)
There is one wrinkle when it comes to '--boundary', which we must not
pass into the bitmap walk in the presence of both '--path-walk' and
'--use-bitmap-index'. Path-walk needs boundary commits when it performs
its own traversal, in order to discover bases for thin packs, but the
bitmap traversal does not expect this. Work around this by setting
`revs->boundary` as late as possible within the '--path-walk' traversal,
after any bitmap attempt has either succeeded or declined to answer the
request.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>