Files
git/Documentation/git-pack-objects.adoc
Taylor Blau d88060561e pack-objects: support --delta-islands with --path-walk
Since the inception of `--path-walk`, this option has had a documented
incompatibility with `--delta-islands`.

When discussing those original patches on the list, a message from
Stolee in [1] noted the following:

    this could be remedied by [...] doing a separate walk to identify
    islands using the normal method

In a related portion of the thread, Peff explains[2]:

    The delta islands code already does its own tree walk to propagate
    the bits down (it does rely on the base walk's show_commit() to
    propagate through the commits).

    Once each object has its island bitmaps, I think however you
    choose to come up with delta candidates [...] you should be able
    to use it. It's fundamentally just answering the question of "am
    I allowed to delta between these two objects".

That is similar to what this patch does, and it turns out the cheaper
option is sufficient: perform the same island side effects from the
path-walk callback rather than doing a second walk.

Recall how delta-islands are computed during a normal repack:

 - `show_commit()` calls `propagate_island_marks()` for each commit,
   which merges the commit's island bitset onto its root tree object and
   onto each of its parent commits.

 - `show_object()` for a tree records the tree's depth derived from the
   slash-separated pathname. Subsequent `resolve_tree_islands()` uses
   that depth to walk trees in increasing-depth order, propagating each
   tree's marks to its children.

 - At delta-search time, `in_same_island()` enforces that a delta
   target's island bitmap is a subset of its base's: every island that
   reaches the target must also reach the base.

Path-walk's enumeration callback is `add_objects_by_path()`. It already
adds objects to `to_pack`, but until now did not perform the
island-related side effects. Two things are needed:

 - For each commit batch, call `propagate_island_marks()` on commits,
   exactly as `show_commit()` does.

   We have to be careful about the order in which we call this function,
   and we must see a commit before its parents in order to have
   island marks to propagate.

   The path-walk batch preserves that order. Path-walk appends commits
   to its `OBJ_COMMIT` batch as they come back from the same
   `get_revision()` loop the regular traversal uses, and
   `add_objects_by_path()` iterates the batch in array order. So every
   commit reaches `propagate_island_marks()` in the same sequence that
   `show_commit()` would have seen it, and the descendant-first chain
   that the algorithm relies on is intact.

   Skip island propagation for excluded commits to match the regular
   traversal, whose `show_commit()` callback is only invoked for
   interesting commits. Boundary commits may still be present in
   path-walk's callback so they can serve as thin-pack bases, but they
   should not contribute island marks.

 - For each tree batch, record the tree's depth from the path. Use the
   `record_tree_depth()` helper from the previous commit so both
   callbacks behave identically, including the max-depth-wins behavior
   when a tree is reached via more than one path. The helper accepts
   both the `show_object()` path shape ("foo", "foo/bar") and the
   path-walk shape with a trailing slash ("foo/", "foo/bar/"), so depths
   recorded from either traversal mode are directly comparable.

   This is implicit in the implementation sketch from Peff above.
   `resolve_tree_islands()` sorts trees by `oe->tree_depth` in
   increasing-depth order before propagating marks down, so that a
   parent tree's marks are finalized before its children inherit them.
   Without recording the depth at path-walk time, every
   path-walk-discovered tree would land at depth 0 in `to_pack`, the
   sort would lose its ordering, and children could inherit marks from
   parents whose own contributions had not yet been merged in.

With those two pieces in place, `resolve_tree_islands()` receives the
same island inputs from path-walk as it would from the regular
traversal, so the existing island checks can be reused unchanged.

Drop the documented incompatibility between `--path-walk` and
`--delta-islands`, and add t5320 coverage for path-walk island repacks
with and without bitmap writing, as well as the same-island case where a
delta remains allowed.

[1]: https://lore.kernel.org/git/9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com/
[2]: https://lore.kernel.org/git/20240911063203.GA1538586@coredump.intra.peff.net/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-03 08:52:59 +09:00

22 KiB