Commit Graph

11 Commits

Author SHA1 Message Date
Taylor Blau
1b17f64d40 repack: allow --write-midx=incremental without --geometric
Previously, `--write-midx=incremental` required `--geometric` and would
die() without it. Relax this restriction so that incremental MIDX
repacking can be used independently.

Without `--geometric`, the behavior is append-only: a single new MIDX
layer is created containing whatever packs were written by the repack
and appended to the existing chain (or a new chain is started). Existing
layers are preserved as-is with no compaction or merging.

Implement this via a new repack_make_midx_append_plan() that builds a
plan consisting of a WRITE step for the freshly written packs followed
by COPY steps for every existing MIDX layer. The existing compaction
plan (repack_make_midx_compaction_plan) is used only when `--geometric`
is active.

Update the documentation to describe the behavior with and without
`--geometric`, and replace the test that enforced the old restriction
with one exercising append-only incremental MIDX repacking.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-29 19:13:14 -07:00
Taylor Blau
a00b0db47f repack: introduce --write-midx=incremental
Expose the incremental MIDX repacking mode (implemented in an earlier
commit) via a new --write-midx=incremental option for `git repack`.

Add "incremental" as a recognized argument to the --write-midx
OPT_CALLBACK, mapping it to REPACK_WRITE_MIDX_INCREMENTAL. When this
mode is active and --geometric is in use, set the midx_layer_threshold
on the pack geometry so that only packs in sufficiently large tip layers
are considered for repacking.

Two new configuration options control the compaction behavior:

 - repack.midxSplitFactor (default: 2): the factor used in the
   geometric merging condition for MIDX layers.

 - repack.midxNewLayerThreshold (default: 8): the minimum number of
   packs in the tip MIDX layer before its packs are considered as
   candidates for geometric repacking.

Add tests exercising the new mode across a variety of scenarios
including basic geometric violations, multi-round chain integrity,
branching and merging histories, cross-layer object uniqueness, and
threshold-based compaction.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-29 19:13:14 -07:00
Taylor Blau
e96ff36dfb repack: implement incremental MIDX repacking
Implement the `write_midx_incremental()` function, which builds and
maintains an incremental MIDX chain as part of the geometric repacking
process.

Unlike the default mode which writes a single flat MIDX, the incremental
mode constructs a compaction plan that determines which MIDX layers to
write, compact, or copy, and then executes each step using `git
multi-pack-index` subcommands with the --checksum-only flag.

The repacking strategy works as follows:

 * Acquire the lock guarding the multi-pack-index-chain.

 * A new MIDX layer is always written containing the newly created
   pack(s). If the tip MIDX layer was rewritten during geometric
   repacking, any surviving packs from that layer are also included.

 * Starting from the new layer, adjacent MIDX layers are merged together
   as long as the accumulated object count exceeds half the object count
   of the next deeper layer (controlled by 'repack.midxSplitFactor').

 * Remaining layers in the chain are evaluated pairwise and either
   compacted or copied as-is, following the same merging condition.

 * Write the contents of the new multi-pack-index chain, atomically move
   it into place, and then release the lock.

 * Delete any now-unused MIDX layers.

After writing the new layer, the strategy is evaluated among the
existing MIDX layers in order from oldest to newest. Each step that
writes a new MIDX layer uses "--checksum-only" to avoid updating the
multi-pack-index-chain file. After all steps are complete, the new chain
file is written and then atomically moved into place.

At present, this functionality is exposed behind a new enum value,
`REPACK_WRITE_MIDX_INCREMENTAL`, but has no external callers. A
subsequent commit will expose this mode via `git repack
--write-midx=incremental`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-29 19:13:14 -07:00
Taylor Blau
70226c8c30 builtin/repack.c: convert --write-midx to an OPT_CALLBACK
Change the --write-midx (-m) flag from an OPT_BOOL to an OPT_CALLBACK
that accepts an optional mode argument. Introduce an enum with
REPACK_WRITE_MIDX_NONE and REPACK_WRITE_MIDX_DEFAULT to distinguish
between the two states, and update all existing boolean checks
accordingly.

For now, passing no argument (or just `-m`) selects the default mode,
preserving existing behavior. A subsequent commit will add a new mode
for writing incremental MIDXs.

Extract repack_write_midx() as a dispatcher that selects the
appropriate MIDX-writing implementation based on the mode.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-29 19:13:14 -07:00
Taylor Blau
16a20d113c repack-midx: extract repack_fill_midx_stdin_packs()
The function `write_midx_included_packs()` manages the lifecycle of
writing packs to stdin when running `git multi-pack-index write` as a
child process.

Extract a standalone `repack_fill_midx_stdin_packs()` helper, which
handles `--stdin-packs` argument setup, starting the command, writing
pack names to its standard input, and finishing the command.

This simplifies `write_midx_included_packs()` and prepares for a
subsequent commit where the same helper is called with `cmd->out = -1`
to capture the MIDX's checksum from the command's standard output,
which is needed when writing MIDX layers with `--checksum-only`.

No functional changes are included in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-29 19:13:13 -07:00
Taylor Blau
f702566e54 repack-midx: factor out repack_prepare_midx_command()
The `write_midx_included_packs()` function assembles and executes a
`git multi-pack-index write` command, constructing the argument list
inline.

Future commits will introduce additional callers that need to construct
similar `git multi-pack-index` commands (for both `write` and `compact`
subcommands), so extract the common portions of the command setup into a
reusable `repack_prepare_midx_command()` helper.

The extracted helper sets `git_cmd`, pushes the `multi-pack-index`
subcommand and verb, and handles `--progress`/`--no-progress` and
`--bitmap` flags. The remaining arguments that are specific to the
`write` subcommand (such as `--stdin-packs`) are left to the caller.

No functional changes are included in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-29 19:13:13 -07:00
Patrick Steinhardt
ed693078e9 pack-bitmap: deduplicate logic to iterate over preferred bitmap tips
We have two locations that iterate over the preferred bitmap tips as
configured by the user via "pack.preferBitmapTips". Both of these
callsites are subtly wrong: when the preferred bitmap tips contain an
exact refname match, then we will hit a `BUG()`.

Prepare for the fix by unifying the two callsites into a new
`for_each_preferred_bitmap_tip()` function.

This removes the last callsite of `bitmap_preferred_tips()` outside of
"pack-bitmap.c". As such, convert the function to be local to that file
only. Note that the function is still used by a second caller, so we
cannot just inline it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-19 10:41:18 -08:00
Patrick Steinhardt
f898661637 refs: expose peeled object ID via the iterator
Both the "files" and "reftable" backend are able to store peeled values
for tags in the respective formats. This allows for a more efficient
lookup of the target object of such a tag without having to manually
peel via the object database.

The infrastructure to access these peeled object IDs is somewhat funky
though. When iterating through objects, we store a pointer reference to
the current iterator in a global variable. The callbacks invoked by that
iterator are then expected to call `peel_iterated_oid()`, which checks
whether the globally-stored iterator's current reference refers to the
one handed into that function. If so, we ask the iterator to peel the
object, otherwise we manually peel the object via the object database.
Depending on global state like this is somewhat weird and also quite
fragile.

Introduce a new `struct reference::peeled_oid` field that can be
populated by the reference backends. This field can be accessed via a
new function `reference_get_peeled_oid()` that either uses that value,
if set, or alternatively peels via the ODB. With this change we don't
have to rely on global state anymore, but make the peeled object ID
available to the callback functions directly.

Adjust trivial callers that already have a `struct reference` available.
Remaining callers will be adjusted in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:25 -08:00
Patrick Steinhardt
bdbebe5714 refs: introduce wrapper struct for each_ref_fn
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.

It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.

One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.

Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.

[1]: <ZmarVcF5JjsZx0dl@tanuki>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04 07:32:24 -08:00
Taylor Blau
6d05eb135f repack: 'write_midx_included_packs' API from the builtin
Now that we have sufficiently cleaned up the write_midx_included_packs()
function, we can move it (along with the struct repack_write_midx_opts)
out of the builtin, and into the repack.h header.

Since this function (and the static ones that it depends on) are
MIDX-specific details of the repacking process, move them to the
repack-midx.c compilation unit instead of the general repack.c one.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:56 -07:00
Taylor Blau
ccb7f822d5 builtin/repack.c: remove ref snapshotting from builtin
When writing a MIDX, 'git repack' takes a snapshot of the repository's
references and writes the result out to a file, which it then passes to
'git multi-pack-index write' via the '--refs-snapshot'.

This is done in order to make bitmap selections with respect to what we
are packing, thus avoiding a race where an incoming reference update
causes us to try and write a bitmap for a commit not present in the
MIDX.

Extract this functionality out into a new repack-midx.c compilation
unit, and expose the necessary functions via the repack.h API.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16 10:08:55 -07:00