git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-06-28 06:35:27 -05:00

Author	SHA1	Message	Date
Johannes Schindelin	624ede0fb4	pack-objects: drop the last size shim in write_no_reuse_object() Continue the size_t evacuation that this series and the merged js/objects-larger-than-4gb-on-windows topic are advancing for >4 GiB objects on Windows: with the odb readers and the zlib helpers reached from do_compress() now widened end-to-end, the last cast_size_t_to_ulong() shim in this function can be removed, and do_compress() itself can carry the new size type through. Two cast_size_t_to_ulong() shims remain in this file; they feed the tree-walk API, which is still narrow and is a separate widening topic. write_no_reuse_object()'s return type and the hashfile API are still narrow but unchanged in observable behaviour: on 64-bit Linux ulong coincides with size_t, and on Windows these were the narrow fenceposts the prior topics deliberately left in place. Their widening is left to follow-ups touching the hashfile API and the write_object() caller chain. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:55 +00:00
Johannes Schindelin	4a47d0787c	pack-objects: drop cast_size_t_to_ulong shims in try_delta() Companion to the prior get_delta() cleanup, and the last try_delta() piece of the >4 GiB delta-path topic. Every consumer that the function's locals fed has now been widened: SIZE() / DELTA_SIZE() to size_t (prior topic), the mem_usage out-parameter and delta_cacheable() earlier in this series, and create_delta() / create_delta_index() in the immediately preceding commits. Widen the declaration of trg_size, src_size, sizediff, max_size and sz to size_t (delta_size joins them on the same line, removing the size_t delta_size line that the create_delta() widening commit added as a stop-gap), and drop the two sz_st bridge variables together with the surrounding cast_size_t_to_ulong() calls. The result is just "odb_read_object(&sz)" on both reads. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:55 +00:00
Johannes Schindelin	da17ddcf28	pack-objects: drop cast_size_t_to_ulong shims in get_delta() The two shims that `606c192380` (odb, packfile: use size_t for streaming object sizes, 2026-05-08) and the subsequent odb_read_object() widening introduced as scaffolding around get_delta()'s reads can now disappear: the previous commit widened diff_delta() to size_t, which was the last narrow consumer in this function. Widen size and base_size to size_t outright, drop the size_st / base_size_st bridging temporaries, and drop the two cast_size_t_to_ulong() calls. Net change is 4 lines smaller and one read-then-cast indirection gone from each odb read. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:55 +00:00
Johannes Schindelin	9a9bc5c853	Merge branch 'size-t/tree'	2026-06-26 08:57:54 +00:00
Johannes Schindelin	824c02f12f	pack-objects: drop the two tree-walk casts in the preferred-base path With init_tree_desc() widened in the prior commit, the size_t-returning odb_read_object_peeled() call in add_preferred_base() and odb_read_object() call in pbase_tree_get() can both flow straight through to init_tree_desc() and into the pbase_tree_cache. Widen pbase_tree_cache.tree_size and the two local size variables to size_t, drop the size_st bridges, and drop the two cast_size_t_to_ulong() shims. This was the last pair of cast_size_t_to_ulong() call sites in builtin/pack-objects.c, completing the >4 GiB-objects work in that file that this branch and its predecessors have been pursuing. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	1d0e6f9746	packfile, git-zlib: widen use_pack() and zstream avail fields to size_t Bundling the two widenings: four call sites pass &stream.avail_in directly to use_pack(), and widening either type fencepost alone would force a bridge variable at each. Doing both together is the simpler end state and is the prerequisite for the do_compress() widening in the next commit, which is what lets write_no_reuse_object() lose its last cast_size_t_to_ulong() shim. The unsigned-long locals widened at the other use_pack() callers (avail / remaining / left) hold pack-window sizes bounded by core.packedGitWindowSize, so the change is type consistency rather than a new >4GB capability. git_zstream.avail_in / avail_out likewise reach zlib's uInt fields only after zlib_buf_cap()'s 1 GiB cap, so the wrapper already accepted size_t-shaped inputs in practice. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	2f8a320527	delta: widen create_delta() and diff_delta() to size_t Last stop in the delta-encoding API widening for >4 GiB blobs on Windows: with create_delta_index() done in the prior commit and create_delta()/diff_delta() finished here, every byte count that crosses delta.h is now size_t. The struct fields they store into have been size_t since the diff-delta struct widening. The API change must move with all callers in the same commit (the build only passes when every &delta_size matches the new size_t). Caller updates are kept minimal: builtin/pack-objects.c get_delta() and try_delta(): widen only the local delta_size variable; the surrounding unsigned-long locals and their cast_size_t_to_ulong() shims are out of scope here and will be cleaned up in their own commits. * builtin/fast-import.c, diff.c, t/helper/test-pack-deltas.c: keep the local unsigned-long delta size (each feeds a still- unsigned-long downstream consumer: zlib's avail_in, deflate_it(), the test helper's own do_compress()), and bridge via a temporary size_t plus cast_size_t_to_ulong(). The new casts are paid back in later topics that widen those consumers. * t/helper/test-delta.c: widen the local outright (no downstream consumer beyond the test's own out_size, which is already size_t). Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	f9a6df6aba	pack-objects: widen mem_usage and try_delta out-param to size_t The pair must move together because find_deltas() passes &mem_usage to try_delta(): widening either alone breaks the type match. mem_usage accumulates per-object byte counts already computed in size_t (SIZE() and sizeof_delta_index() reach here through free_unpacked(), now size_t), and was the last 32-bit-on-Windows narrowing point in the delta-window memory accounting chain. With this commit, that chain is internally size_t end-to-end except for sizeof_delta_index()'s still-narrow return, whose value is bounded by create_delta_index()'s entries cap. window_memory_limit (config-driven via git_config_ulong()) stays unsigned long: it is only compared against mem_usage and promotes. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	819d8a1cfd	pack-objects: widen free_unpacked() return to size_t free_unpacked() sums two byte counts: sizeof_delta_index() and SIZE(n->entry). The latter has been size_t since the prior topic "More work supporting objects larger than 4GB on Windows" widened SIZE() / oe_size() to size_t, so accumulating it into an unsigned long return was a silent Windows-only truncation on a packing run with many large objects. The sole caller (find_deltas()) holds its own mem_usage in an unsigned long for now and subtracts the return into it, so the new narrowing happens at that subtraction. find_deltas() and the matching try_delta() out-parameter are widened next. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	848acf183c	pack-objects: widen delta-cache accounting to size_t These three are a single accounting tuple (the globals tracking cumulative cached-delta bytes, plus the helper that compares them against an incoming delta size) and are latently 32-bit on Windows where unsigned long != size_t: a pack with many large cached deltas could wrap silently. The widening is internally consistent on its own: the additions and subtractions against delta_cache_size already come from size_t sources (DELTA_SIZE() returns size_t), and delta_cacheable()'s sole caller in try_delta() still passes unsigned long, which promotes. Prerequisite for dropping try_delta()'s cast_size_t_to_ulong() shims, which becomes possible once create_delta() and diff_delta() are widened in a later commit. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Junio C Hamano	2fb57b8177	Merge branch 'ps/odb-drop-whence' into jch The whence field in struct object_info has been removed, refactoring backend-specific object information retrieval into an opt-in struct object_info_source structure. * ps/odb-drop-whence: odb: document object info fields odb: drop `whence` field from object info treewide: convert users of `whence` to the new source field odb: add `source` field to struct object_info_source odb: make backend-specific fields optional packfile: thread odb_source_packed through packed_object_info()	2026-06-25 19:49:27 -07:00
Junio C Hamano	8811ac8af4	Merge branch 'tb/pack-path-walk-bitmap-delta-islands' into jch The pack-objects command now supports using reachability bitmaps and delta-islands concurrently with the `--path-walk` option, allowing faster packaging by falling back to path-walk when bitmaps cannot fully satisfy the request. * tb/pack-path-walk-bitmap-delta-islands: pack-objects: support `--delta-islands` with `--path-walk` pack-objects: extract `record_tree_depth()` helper pack-objects: support reachability bitmaps with `--path-walk` t/perf: drop p5311's lookup-table permutation	2026-06-25 19:49:19 -07:00
Junio C Hamano	2a8c778710	Merge branch 'ps/odb-source-packed' into jch The packed object source has been refactored into a proper struct odb_source. * ps/odb-source-packed: odb/source-packed: drop pointer to "files" parent source midx: refactor interfaces to work on "packed" source odb/source-packed: stub out remaining functions odb/source-packed: wire up `freshen_object()` callback odb/source-packed: wire up `find_abbrev_len()` callback odb/source-packed: wire up `count_objects()` callback odb/source-packed: wire up `for_each_object()` callback odb/source-packed: wire up `read_object_stream()` callback odb/source-packed: wire up `read_object_info()` callback packfile: use higher-level interface to implement `has_object_pack()` odb/source-packed: wire up `reprepare()` callback odb/source-packed: wire up `close()` callback odb/source-packed: start converting to a proper `struct odb_source` odb/source-packed: store pointer to "files" instead of generic source packfile: move packed source into "odb/" subsystem packfile: split out packfile list logic packfile: rename `struct packfile_store` to `odb_source_packed`	2026-06-25 19:49:17 -07:00
Patrick Steinhardt	4f48d2a241	treewide: convert users of `whence` to the new source field The `whence` field has become redundant now that callers can learn about the exact source an object has been looked up from via the `struct object_info_source::source` field. Adapt callers to use the new field. Note that all callsites already set up the `info.sourcep` request pointer, so the conversion is rather straight-forward. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-24 10:12:35 -07:00
Patrick Steinhardt	695797490e	odb: make backend-specific fields optional The `struct object_info` carries two pieces of information about how an object was looked up: - The `whence` enum identifying the backend. - The backend-tagged union `u` exposing backend-specific details (currently only the packed-source case, which records the owning pack, offset and packed object type). The union is populated unconditionally, even though most callers don't care about provenance at all. Split the backend-specific union out into a new public type, `struct object_info_source`, and make the object info structure carry it via just another opt-in request pointer. As with all the other requestable information, callers that need source info allocate a `struct object_info_source` on the stack and point `sourcep` at it; callers that don't care about it simply leave the field as a `NULL` pointer. Adapt callers accordingly. Note that the `whence` enum is strictly-speaking also backend-specific information, so it would be another good candidate to be moved into the `struct object_info_source`. For now though it is left alone, as it will be replaced by a `struct odb_source` pointer in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-24 10:12:35 -07:00
Patrick Steinhardt	1b9f137b43	packfile: thread odb_source_packed through packed_object_info() Add an optional `struct odb_source_packed *source` parameter to `packed_object_info()` and `packed_object_info_with_index_pos()`. This parameter is unused at this point in time, but it will be used in a follow-up commit so that we can record the source of a specific object. Note that callers in "odb/source-packed.c" pass the already-available source, but all other callers pass `NULL` instead. This is fine though, as we only care about populating this info when called via the packed store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-24 10:12:35 -07:00
Junio C Hamano	2ed34f72cf	Merge branch 'ps/odb-source-packed' into ps/odb-drop-whence * ps/odb-source-packed: odb/source-packed: drop pointer to "files" parent source midx: refactor interfaces to work on "packed" source odb/source-packed: stub out remaining functions odb/source-packed: wire up `freshen_object()` callback odb/source-packed: wire up `find_abbrev_len()` callback odb/source-packed: wire up `count_objects()` callback odb/source-packed: wire up `for_each_object()` callback odb/source-packed: wire up `read_object_stream()` callback odb/source-packed: wire up `read_object_info()` callback packfile: use higher-level interface to implement `has_object_pack()` odb/source-packed: wire up `reprepare()` callback odb/source-packed: wire up `close()` callback odb/source-packed: start converting to a proper `struct odb_source` odb/source-packed: store pointer to "files" instead of generic source packfile: move packed source into "odb/" subsystem packfile: split out packfile list logic packfile: rename `struct packfile_store` to `odb_source_packed`	2026-06-24 10:12:12 -07:00
Junio C Hamano	02bb39c5cb	Merge branch 'js/objects-larger-than-4gb-on-windows-more' * js/objects-larger-than-4gb-on-windows-more: odb: use size_t for object_info.sizep and the size APIs packfile,delta: drop the `cast_size_t_to_ulong()` wrappers pack-objects: use size_t for in-core object sizes packfile: widen unpack_entry()'s size out-parameter to size_t pack-objects(check_pack_inflate()): use size_t instead of unsigned long patch-delta: use size_t for sizes compat/msvc: use _chsize_s for ftruncate	2026-06-21 16:41:38 -07:00
Taylor Blau	7e6de2ac62	pack-objects: support `--delta-islands` with `--path-walk` Since the inception of `--path-walk`, this option has had a documented incompatibility with `--delta-islands`. When discussing those original patches on the list, a message from Stolee in [1] noted the following: this could be remedied by [...] doing a separate walk to identify islands using the normal method In a related portion of the thread, Peff explains[2]: The delta islands code already does its own tree walk to propagate the bits down (it does rely on the base walk's show_commit() to propagate through the commits). Once each object has its island bitmaps, I think however you choose to come up with delta candidates [...] you should be able to use it. It's fundamentally just answering the question of "am I allowed to delta between these two objects". That is similar to what this patch does, and it turns out the cheaper option is sufficient: perform the same island side effects from the path-walk callback rather than doing a second walk. Recall how delta-islands are computed during a normal repack: - `show_commit()` calls `propagate_island_marks()` for each commit, which merges the commit's island bitset onto its root tree object and onto each of its parent commits. - `show_object()` for a tree records the tree's depth derived from the slash-separated pathname. Subsequent `resolve_tree_islands()` uses that depth to walk trees in increasing-depth order, propagating each tree's marks to its children. - At delta-search time, `in_same_island()` enforces that a delta target's island bitmap is a subset of its base's: every island that reaches the target must also reach the base. Path-walk's enumeration callback is `add_objects_by_path()`. It already adds objects to `to_pack`, but until now did not perform the island-related side effects. Two things are needed: - For each commit batch, call `propagate_island_marks()` on commits, exactly as `show_commit()` does. We have to be careful about the order in which we call this function, and we must see a commit before its parents in order to have island marks to propagate. The path-walk batch preserves that order. Path-walk appends commits to its `OBJ_COMMIT` batch as they come back from the same `get_revision()` loop the regular traversal uses, and `add_objects_by_path()` iterates the batch in array order. So every commit reaches `propagate_island_marks()` in the same sequence that `show_commit()` would have seen it, and the descendant-first chain that the algorithm relies on is intact. Skip island propagation for excluded commits to match the regular traversal, whose `show_commit()` callback is only invoked for interesting commits. Boundary commits may still be present in path-walk's callback so they can serve as thin-pack bases, but they should not contribute island marks. - For each tree batch, record the tree's depth from the path. Use the `record_tree_depth()` helper from the previous commit so both callbacks behave identically, including the max-depth-wins behavior when a tree is reached via more than one path. The helper accepts both the `show_object()` path shape ("foo", "foo/bar") and the path-walk shape with a trailing slash ("foo/", "foo/bar/"), so depths recorded from either traversal mode are directly comparable. This is implicit in the implementation sketch from Peff above. `resolve_tree_islands()` sorts trees by `oe->tree_depth` in increasing-depth order before propagating marks down, so that a parent tree's marks are finalized before its children inherit them. Without recording the depth at path-walk time, every path-walk-discovered tree would land at depth 0 in `to_pack`, the sort would lose its ordering, and children could inherit marks from parents whose own contributions had not yet been merged in. With those two pieces in place, `resolve_tree_islands()` receives the same island inputs from path-walk as it would from the regular traversal, so the existing island checks can be reused unchanged. Drop the documented incompatibility between `--path-walk` and `--delta-islands`, and add t5320 coverage for path-walk island repacks with and without bitmap writing, as well as the same-island case where a delta remains allowed. [1]: https://lore.kernel.org/git/9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com/ [2]: https://lore.kernel.org/git/20240911063203.GA1538586@coredump.intra.peff.net/ Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-21 16:26:14 -07:00
Taylor Blau	264efee401	pack-objects: extract `record_tree_depth()` helper Prepare for a subsequent change that needs to record tree depths from a second call site by factoring the delta-islands tree-depth bookkeeping out of `show_object()` and into a helper, `record_tree_depth()`. The helper looks up the object in `to_pack`, returns early when the object was not added there, computes the depth from the slash count in the supplied name, and preserves the existing max-depth-wins behavior when a tree is reached by more than one path. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-21 16:26:14 -07:00
Taylor Blau	0a37451106	pack-objects: support reachability bitmaps with `--path-walk` When 'pack-objects' is invoked with '--path-walk', it prevents us from using reachability bitmaps. This behavior dates back to `70664d2865` (pack-objects: add --path-walk option, 2025-05-16), which included a comment in the relevant portion of the command-line arguments handling that read as follows: /* * We must disable the bitmaps because we are removing * the --objects / --objects-edge[-aggressive] options. / In `fb2c309b7d` (pack-objects: pass --objects with --path-walk, 2026-05-02), path-walk learned to pass '--objects' again, but still kept bitmap traversal disabled. That leaves two useful cases unsupported: A path-walk repack that writes bitmaps does not give the bitmap selector any commits, because path-walk reveals commits through `add_objects_by_path()` rather than through `show_commit()`, where `index_commit_for_bitmap()` is normally called. * An invocation like "git pack-objects --use-bitmap-index --path-walk" never tries an existing bitmap, even when one is available and could answer the request. Fortunately for us, neither restriction is required. * On the writing side: teach the path-walk object callback to call `index_commit_for_bitmap()` for commits that it adds to the pack. That gives the bitmap selector the commit candidates it would have seen from the regular traversal. * For bitmap reading, keep passing '--objects' to the internal rev_list machinery, but stop clearing `use_bitmap_index`. If an existing bitmap can answer the request, use it; otherwise fall back to path-walk's own enumeration. As a result, we can see significantly reduced pack generation times from p5311 (with our `GIT_PERF_REPO` set to a recent clone of the fluentui repository) before this commit: Test HEAD^ HEAD ---------------------------------------------------------------------------------------- 5311.40: server (1 days, --path-walk) 1.43(1.39+0.04) 0.01(0.01+0.00) -99.3% 5311.41: size (1 days, --path-walk) 139.6K 139.7K +0.0% 5311.42: client (1 days, --path-walk) 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0% 5311.44: server (2 days, --path-walk) 1.43(1.39+0.04) 0.01(0.00+0.00) -99.3% 5311.45: size (2 days, --path-walk) 139.6K 139.7K +0.0% 5311.46: client (2 days, --path-walk) 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0% 5311.48: server (4 days, --path-walk) 1.44(1.39+0.04) 0.01(0.01+0.00) -99.3% 5311.49: size (4 days, --path-walk) 238.1K 238.1K +0.0% 5311.50: client (4 days, --path-walk) 0.03(0.03+0.00) 0.03(0.03+0.00) +0.0% 5311.52: server (8 days, --path-walk) 1.43(1.39+0.03) 0.01(0.00+0.00) -99.3% 5311.53: size (8 days, --path-walk) 344.9K 344.9K +0.0% 5311.54: client (8 days, --path-walk) 0.07(0.07+0.00) 0.07(0.08+0.00) +0.0% 5311.56: server (16 days, --path-walk) 1.47(1.44+0.03) 0.10(0.08+0.01) -93.2% 5311.57: size (16 days, --path-walk) 844.0K 844.0K +0.0% 5311.58: client (16 days, --path-walk) 0.09(0.09+0.00) 0.09(0.09+0.00) +0.0% 5311.60: server (32 days, --path-walk) 1.52(1.50+0.05) 0.14(0.15+0.02) -90.8% 5311.61: size (32 days, --path-walk) 4.2M 4.2M +0.1% 5311.62: client (32 days, --path-walk) 0.34(0.48+0.02) 0.34(0.45+0.05) +0.0% 5311.64: server (64 days, --path-walk) 1.55(1.52+0.06) 0.15(0.15+0.04) -90.3% 5311.65: size (64 days, --path-walk) 6.4M 6.4M -0.0% 5311.66: client (64 days, --path-walk) 0.51(0.79+0.05) 0.51(0.80+0.06) +0.0% 5311.68: server (128 days, --path-walk) 1.59(1.57+0.06) 0.16(0.21+0.01) -89.9% 5311.69: size (128 days, --path-walk) 8.4M 8.4M -0.0% 5311.70: client (128 days, --path-walk) 0.72(1.44+0.08) 0.71(1.47+0.09) -1.4% We get the same size of output pack, but this commit allows us to do so in a significantly shorter amount of time. Intuitively, we're generating the same pack (hence the unchanged 'test_size' output from run to run), but varying how we get there. Before this commit, pack-objects prefers '--path-walk' to '--use-bitmap-index', so we generate the output pack by performing a normal '--path-walk' traversal. With this commit, we are operating over a repacked state (that itself was done with a '--path-walk' traversal), but are able to perform pack-reuse on that repacked state via bitmaps. When comparing the size of the repacked pack with/without '--path-walk' on the previous commit versus this one, we see that (a) the repacked size improves significantly with '--path-walk', and that (b) writing bitmaps during repacking does not regress this improvement: Test HEAD^ HEAD ---------------------------------------------------------------------------------------- 5311.3: size of bitmapped pack 558.4M 558.5M +0.0% 5311.38: size of bitmapped pack (--path-walk) 164.4M 164.4M +0.0% (Note that to observe an improvement here, we must repack with '-F' in order to avoid reusing non-'--path-walk' deltas, which would otherwise skew our results.) There is one wrinkle when it comes to '--boundary', which we must not pass into the bitmap walk in the presence of both '--path-walk' and '--use-bitmap-index'. Path-walk needs boundary commits when it performs its own traversal, in order to discover bases for thin packs, but the bitmap traversal does not expect this. Work around this by setting `revs->boundary` as late as possible within the '--path-walk' traversal, after any bitmap attempt has either succeeded or declined to answer the request. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-21 16:26:14 -07:00
Patrick Steinhardt	7fa8c61afe	midx: refactor interfaces to work on "packed" source Our interfaces used to interact with MIDXs all work on top of the generic `struct odb_source`. This doesn't make much sense though: a MIDX is strictly tied to the "packed" source, so passing in a generic source gives the false sense that it may also work with a different type of source. Fix this conceptual weirdness and instead require the caller to pass in a "packed" source explicitly. This also makes the next commit easier to implement, where we drop the pointer to the "files" source in the "packed" source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-17 05:00:01 -07:00
Patrick Steinhardt	7ed53cde28	odb/source-packed: wire up `for_each_object()` callback Move `packfile_store_for_each_object()` and its associated helpers from "packfile.c" into "odb/source-packed.c" and wire it up as the `for_each_object()` callback of the "packed" source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-17 05:00:00 -07:00
Junio C Hamano	6e148f82dc	Merge branch 'kk/streaming-walk-pqueue' Streaming revision walks have been optimized by using a priority queue for date-sorting commits, speeding up walks repositories with many merges. * kk/streaming-walk-pqueue: revision: use priority queue for non-limited streaming walks revision: introduce rev_walk_mode to clarify get_revision_1() pack-objects: call release_revisions() after cruft traversal	2026-06-16 09:01:02 -07:00
Johannes Schindelin	c6a4629e32	odb: use size_t for object_info.sizep and the size APIs When `js/objects-larger-than-4gb-on-windows` widened the streaming, index-pack and unpack-objects code paths, in the interest of keeping the patches somewhat reasonably-sized, it left the public ODB API still typed in `unsigned long`. In particular `struct object_info::sizep` and the four wrappers built on top of it (`odb_read_object`, `odb_read_object_peeled`, `odb_read_object_info`, `odb_pretend_object`) still return the unpacked size through `unsigned long *`, so on Windows `cat-file -s` and the `git add` / `git status` paths for a >4 GiB blob silently cap at 4 GiB. Widen the field and the four wrappers. The previous commits already widened the `unpack_entry()` cascade and pack-objects' in-core size accessors, so most of the cascade arrives here with no further work: the temporary shims in `packed_object_info_with_index_pos()` and in `unpack_entry()`'s delta-base recovery path go away, the two `SET_SIZE(entry, cast_size_t_to_ulong(canonical_size))` calls in `check_object()` and the matching one in `drop_reused_delta()` collapse to plain `SET_SIZE`, and `oe_get_size_slow()`'s tail `cast_size_t_to_ulong()` is gone too. What remains narrow are the boundaries this series does not intend to touch: the diff, blame, textconv and fast-import machinery. Even so, this patch is unfortunately quite large. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-15 07:45:41 -07:00
Johannes Schindelin	188bac14f7	pack-objects: use size_t for in-core object sizes `pack-objects` stores per-entry object sizes in either the 31-bit `size_` member of the `struct object_entry` or, when the value does not fit, the `pack->delta_size[]` spill array. The accessors (`oe_size`, `oe_delta_size`, `oe_get_size_slow`, `oe_size_*_than`) and the setters (`oe_set_size`, `oe_set_delta_size`) used `unsigned long` for the spill type, which on Windows means the spill silently caps at 4 GiB per entry. That is what made `upload-pack` die with "object too large to read on this platform" when serving the >4 GiB blob in `t5608` tests 5 and 6 when run with `GIT_TEST_CLONE_2GB`. Widen them all to `size_t` (including `pack->delta_size`) and drop the three `cast_size_t_to_ulong()` calls in `check_object()` that guarded `in_pack_size`. The two `SET_SIZE(entry, canonical_size)` calls in the same function stay cast-free as before, since `canonical_size` is still `unsigned long` until a later commit widens `object_info::sizep`. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-15 07:45:41 -07:00
Johannes Schindelin	1d43315b31	pack-objects(check_pack_inflate()): use size_t instead of unsigned long `write_reuse_object()` learned to track its packed-object size as `size_t` in `606c192380` (odb, packfile: use size_t for streaming object sizes, 2026-05-08), but the comparison sink it feeds, `check_pack_inflate()`, still takes the expected decompressed size as `unsigned long`. The call site bridges the mismatch with `cast_size_t_to_ulong()`, which on Windows turns a >4 GiB object into an immediate die(). That function only uses `expect` once: as the right-hand side of a `stream.total_out == expect` equality test against zlib's counter. zlib's own `total_out` counter is `uLong` and is therefore still 32-bit-bound on Windows. Widening `expect` to `size_t` cannot fix that, but it is a strict improvement nonetheless: instead of dying outright, an oversized object now simply makes the equality fail and lets `write_reuse_object()` fall back to `write_no_reuse_object()`, which decompresses and re-deflates the content (and which the larger pack-objects widening series targets separately). Drop the `cast_size_t_to_ulong()` shim at the call site now that the receiving parameter speaks the same type as `entry_size`. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-15 07:45:40 -07:00
Junio C Hamano	ff1784217f	Merge branch 'ak/typofixes' Typofixes. * ak/typofixes: doc: fix typos via codespell	2026-06-15 07:42:00 -07:00
Junio C Hamano	883a47ef64	Merge branch 'ob/more-repo-config-values' Many core configuration variables have been migrated from global variables into 'repo_config_values' to tie them to a specific repository instance, avoiding cross-repository state leakage. * ob/more-repo-config-values: environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values` environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values` environment: move "core_sparse_checkout_cone" into `struct repo_config_values` environment: move "precomposed_unicode" into `struct repo_config_values` environment: move "pack_compression_level" into `struct repo_config_values` environment: move `zlib_compression_level` into `struct repo_config_values` environment: move "check_stat" into `struct repo_config_values` environment: move "trust_ctime" into `struct repo_config_values`	2026-06-15 07:42:00 -07:00
Junio C Hamano	06f63df846	Merge branch 'ps/odb-source-loose' The loose object source has been refactored into a proper `struct odb_source`. * ps/odb-source-loose: odb/source-loose: drop pointer to the "files" source odb/source-loose: stub out remaining callbacks odb/source-loose: wire up `write_object_stream()` callback object-file: refactor writing objects to use loose source odb/source-loose: wire up `write_object()` callback loose: refactor object map to operate on `struct odb_source_loose` odb/source-loose: wire up `freshen_object()` callback odb/source-loose: drop `odb_source_loose_has_object()` odb/source-loose: wire up `count_objects()` callback odb/source-loose: wire up `find_abbrev_len()` callback odb/source-loose: wire up `for_each_object()` callback odb/source-loose: wire up `read_object_stream()` callback odb/source-loose: wire up `read_object_info()` callback odb/source-loose: wire up `close()` callback odb/source-loose: wire up `reprepare()` callback odb/source-loose: start converting to a proper `struct odb_source` odb/source-loose: store pointer to "files" instead of generic source odb/source-loose: move loose source into "odb/" subsystem	2026-06-11 04:31:18 -07:00
Andrew Kreimer	014c454799	doc: fix typos via codespell There are some typos in the documentation, comments, etc. Fix them via codespell, and then adjust the "dump" files used by the subversion tests to match the updated contents. Signed-off-by: Andrew Kreimer <algonell@gmail.com> [dscho noticed and fixed the problems in svn test] Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> [jc did final assembling of the three patches] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-08 00:21:35 +09:00
Junio C Hamano	f985a6ec65	Merge branch 'ps/odb-source-loose' into ps/odb-source-packed * ps/odb-source-loose: odb/source-loose: drop pointer to the "files" source odb/source-loose: stub out remaining callbacks odb/source-loose: wire up `write_object_stream()` callback object-file: refactor writing objects to use loose source odb/source-loose: wire up `write_object()` callback loose: refactor object map to operate on `struct odb_source_loose` odb/source-loose: wire up `freshen_object()` callback odb/source-loose: drop `odb_source_loose_has_object()` odb/source-loose: wire up `count_objects()` callback odb/source-loose: wire up `find_abbrev_len()` callback odb/source-loose: wire up `for_each_object()` callback odb/source-loose: wire up `read_object_stream()` callback odb/source-loose: wire up `read_object_info()` callback odb/source-loose: wire up `close()` callback odb/source-loose: wire up `reprepare()` callback odb/source-loose: start converting to a proper `struct odb_source` odb/source-loose: store pointer to "files" instead of generic source odb/source-loose: move loose source into "odb/" subsystem	2026-06-05 22:26:06 +09:00
Olamide Caleb Bello	8407abf02a	environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values` The `core.warnAmbiguousRefs` configuration was previously stored in a global `int` variable, making it shared across repository instances and risking cross‑repository state leakage. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. This option is parsed eagerly because ambiguity warnings influence how users interpret object references in many commands; a lazy parse could cause these warnings to behave inconsistently or to appear for the wrong repository, confusing users and hindering libification. This preserves the existing behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-03 08:36:48 +09:00
Olamide Caleb Bello	8cd7402acc	environment: move "pack_compression_level" into `struct repo_config_values` The `pack_compression_level` configuration is currently stored in the global variable `pack_compression_level`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `pack_compression_level` is parsed eagerly because it influences packfile compression, a core operation where a lazy parse could cause inconsistent behavior and hamper libification. This preserves the existing eager‑parsing behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-03 08:36:48 +09:00
Junio C Hamano	ffaa2eddd0	Merge branch 'ds/path-walk-filters' The "git pack-objects --path-walk" traversal has been integrated with several object filters, including blobless and sparse filters. * ds/path-walk-filters: path-walk: support `combine` filter path-walk: support `object:type` filter path-walk: support `tree:0` filter t6601: tag otherwise-unreachable trees pack-objects: support sparse:oid filter with path-walk path-walk: add pl_sparse_trees to control tree pruning path-walk: support blob size limit filter backfill: die on incompatible filter options path-walk: support blobless filter path-walk: always emit directly-requested objects t/perf: add pack-objects filter and path-walk benchmark pack-objects: pass --objects with --path-walk t5620: make test work with path-walk var	2026-06-02 16:15:29 +09:00
Patrick Steinhardt	86f7ab5a1f	odb/source-loose: drop `odb_source_loose_has_object()` The function `odb_source_loose_has_object()` checks whether a specific object exists as a loose object on disk by using lstat(3p). This interface is somewhat redundant, as we typically check for object existence in a generic way via `odb_source_read_object_info()`. In fact, these two calls are redundant in case the latter is called in a specific way: when called without an object info request and without the `OBJECT_INFO_QUICK` flag, then we will end up doing the same call to lstat(3p) in `read_object_info_from_path()`. Drop the function and adapt callers to instead use the generic interface so that its calling conventions align with that of other sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Junio C Hamano	59cccb3b0c	Merge branch 'ds/path-walk-filters' into tb/pack-path-walk-bitmap-delta-islands * ds/path-walk-filters: path-walk: support `combine` filter path-walk: support `object:type` filter path-walk: support `tree:0` filter t6601: tag otherwise-unreachable trees pack-objects: support sparse:oid filter with path-walk path-walk: add pl_sparse_trees to control tree pruning path-walk: support blob size limit filter backfill: die on incompatible filter options path-walk: support blobless filter path-walk: always emit directly-requested objects t/perf: add pack-objects filter and path-walk benchmark pack-objects: pass --objects with --path-walk t5620: make test work with path-walk var	2026-05-30 10:10:56 +09:00
Kristofer Karlsson	9f4e170dfc	pack-objects: call release_revisions() after cruft traversal enumerate_and_traverse_cruft_objects() initializes a rev_info on the stack but never calls release_revisions() afterwards. This is not visible on master but becomes a leak once the revision walking machinery uses dynamically allocated structures. Add the missing release_revisions() call. Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-28 06:08:19 +09:00
Derrick Stolee	2dc858e69e	pack-objects: support sparse:oid filter with path-walk The --filter=sparse:<oid> option to 'git pack-objects' allows focusing an object set to a sparse-checkout definition. This reduces the set of matching blobs while retaining all reachable trees. No server currently supports fetching with this filter because it is expensive to compute and reachability bitmaps do not help without a significant effort to extend the bitmap feature to store bitmaps for each supported sparse- checkout definition. Without focusing on serving fetches and clones with these filters, there are still benefits that could be realized by making this faster. With the sparse index, it's more realistic now than ever to be able to operate a local clone that was bootstrapped by a packfile created with a sparse filter, because the missing trees are not needed to move a sparse-checkout from one commit to another or to view the history of any path in scope. Such clones could perhaps be bootstrapped by partial bundles. Previously, constructing these sparse packs has been incredibly computationally inefficient. The revision walk that explores which objects are in scope spends a lot of time checking each object to see if it matches the sparse-checkout patterns, causing quadratic behavior (number of objects times number of sparse-checkout patterns). This improves somewhat when using cone-mode sparse-checkout patterns that can use hashtables and prefix matches to determine containment. However, the check per object is still too expensive for most cases. This is where the path-walk feature comes in. We can proceed as normal by placing objects in bins by path and _then_ check a group of objects all at once. Since sparse:<oid> only restricts blobs, the path-walk must include all reachable trees while using the cone-mode patterns to skip blobs at paths outside the sparse scope. This establishes a baseline for a potential future "treesparse:<oid>" filter that would also restrict trees, but introducing such a new filter is deferred to a later change. The implementation here is focused around loading the sparse-checkout patterns from the provided object ID and checking that the patterns are indeed cone-mode patterns. We can then load the correct pattern list into the path walk context and use the logic that already exists from `bff4555767` (backfill: add --sparse option, 2025-02-03), though that feature loads sparse-checkout patterns from the worktree's local settings and also restricts tree objects. We use a combination of errors and warnings to signal problems during this load. The difference is that errors are likely fatal for the non-path-walk version while the warnings are probably just implementation details for the path-walk version and the 'git pack-objects' command can fall back to the revision walk version. Now that the SEEN flag is deferred until after pattern checks (from the previous commit), handle the case where a tree with a shared OID appears at both an out-of-cone and in-cone path. When trees are not being pruned (pl_sparse_trees == 0), the path-walk re-walks the tree at the in-cone path so that in-cone blobs within it are discovered. The new tests in t5317 and t6601 demonstrate this behavior and would fail without these changes. The performance test p5315 shows the impact of this change when using sparse filters: Test HEAD~1 HEAD ---------------------------------------------------------------------- 5315.10: repack (sparse:oid) 77.98 77.47 -0.7% 5315.11: repack size (sparse:oid) 187.5M 187.4M -0.0% 5315.12: repack (sparse:oid, --path-walk) 77.91 31.41 -59.7% 5315.13: repack size (sparse:oid, --path-walk) 187.5M 161.1M -14.1% These performance tests were run on the Git repository. The --path-walk feature shows meaningful space savings (14% smaller for sparse packs) and dramatic time savings (60% faster) by leveraging the path-walk's ability to skip blobs outside the sparse scope. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blaue <me@ttaylorr.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	6d87f0e8a3	path-walk: support blobless filter The 'git pack-objects' command can opt-in to using the path-walk API for scanning the objects. Currently, this option is dynamically disabled if combined with '--filter=<X>', even when using a simple filter such as 'blob:none' to signal a blobless packfile. This is a common scenario for repos at scale, so is worth integrating. Also, users can opt-in to the '--path-walk' option by default through the pack.usePathWalk=true config option. When using that in a blobless partial clone, the following warning can appear even though the user did not specify either option directly: warning: cannot use --filter with --path-walk Teach the path-walk API to handle the 'blob:none' object filter natively. When revs->filter.choice is LOFC_BLOB_NONE, the path-walk sets info->blobs to 0 (skipping all blob objects) and clears the filter from revs so that prepare_revision_walk() does not reject the configuration. This check is implemented in the static prepare_filters() method, which will simultaneously check if the input filters are compatible and will make the appropriate mutations to the path_walk_info and filters if the path_walk_info is non-NULL. This allows us to use this logic both in the API method path_walk_filter_compatible() for use in builtin/pack-objects.c and as a prep step in walk_objects_by_path(). Update the test helper (test-path-walk) to accept --filter=<spec> as a test-tool option (before '--'), applying it to revs after setup_revisions() to avoid the --objects requirement check. We can also revert recent GIT_TEST_PACK_PATH_WALK overrides in t5620. Also switch test-path-walk from REV_INFO_INIT with manual repo assignment to repo_init_revisions(), which properly initializes the filter_spec strbuf needed for filter parsing. Add tests for blob:none with --all and with a single branch. The performance test p5315 shows the impact of this change when using blobless filters: Test HEAD~1 HEAD --------------------------------------------------------------------- 5315.6: repack (blob:none) 13.53 13.87 +2.5% 5315.7: repack size (blob:none) 137.7M 137.8M +0.1% 5315.8: repack (blob:none, --path-walk) 13.51 23.43 +73.4% 5315.9: repack size (blob:none, --path-walk) 137.7M 115.2M -16.3% These performance tests were run on the Git repository. The --path-walk feature shows meaningful space savings (16% smaller for blobless packs) at the cost of increased computation time due to the two compression passes. This data demonstrates that the feature is engaged and provides real compression benefits when --no-reuse-delta forces fresh deltas. Co-Authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Derrick Stolee	35567889ef	pack-objects: pass --objects with --path-walk When 'git pack-objects' has the --path-walk option enabled, it uses a different set of revision walk parameters than normal. For one, --objects was previously assumed by the path-walk API and could be omitted. We also needed --boundary to allow discovering UNINTERESTING objects to use as delta bases. We will be updating the path-walk API soon to work with some filter options. However, the revision machinery will trigger a fatal error: fatal: object filtering requires --objects The fix is easy: add the --objects option as an argument. This has no effect on the path-walk API but does simplify the revision option parsing for the objects filter. We can remove the comment about "removing" the options because they were never removed and instead not added. We still need to disable using bitmaps. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-24 18:41:06 +09:00
Johannes Schindelin	606c192380	odb, packfile: use size_t for streaming object sizes The odb_read_stream structure uses unsigned long for the size field, which is 32-bit on Windows even in 64-bit builds. When streaming objects larger than 4GB, the size would be truncated to zero or an incorrect value, resulting in empty files being written to disk. Change the size field in odb_read_stream to size_t and introduce unpack_object_header_sz() to return sizes via size_t pointer. Since object_info.sizep remains unsigned long for API compatibility, use temporary variables where the types differ, with comments noting the truncation limitation for code paths that still use unsigned long. Widening the producers to size_t in this way introduces a handful of silent size_t -> unsigned long narrowings on Windows, all in builtin/pack-objects.c, where the consumers are still typed unsigned long. Make those narrowings explicit with cast_size_t_to_ulong() so they assert loudly the moment an object actually exceeds ULONG_MAX bytes: - oe_get_size_slow() returns unsigned long but holds a size_t locally; cast at the return. - write_reuse_object() passes a size_t into check_pack_inflate(), whose expect parameter is unsigned long; cast at the call. - check_object() routes a size_t through SET_SIZE() and SET_DELTA_SIZE(), both of which take unsigned long via oe_set_size() / oe_set_delta_size(); cast at the three call sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the non-delta default arm. The cast-only treatment is deliberately a stop-gap. Properly widening oe_set_size, oe_get_size_slow's return type, check_pack_inflate's expect parameter, object_info.sizep, patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series that is too large to be reviewable, so the proper widening is deferred to a follow-up topic. Until then, cast_size_t_to_ulong() at least makes the truncation explicit at the source: it documents the boundary, and on a 64-bit non-Windows platform it is a no-op. This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:31 +09:00
Junio C Hamano	03311dca7f	Merge branch 'tb/stdin-packs-excluded-but-open' pack-objects's --stdin-packs=follow mode learns to handle excluded-but-open packs. * tb/stdin-packs-excluded-but-open: repack: mark non-MIDX packs above the split as excluded-open pack-objects: support excluded-open packs with --stdin-packs t7704: demonstrate failure with once-cruft objects above the geometric split pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` pack-objects: plug leak in `read_stdin_packs()`	2026-04-06 15:42:49 -07:00
Junio C Hamano	d75badf83b	Merge branch 'ps/odb-generic-object-name-handling' Object name handling (disambiguation and abbreviation) has been refactored to be backend-generic, moving logic into the respective object database backends. * ps/odb-generic-object-name-handling: odb: introduce generic `odb_find_abbrev_len()` object-file: move logic to compute packed abbreviation length object-name: move logic to compute loose abbreviation length object-name: simplify computing common prefixes object-name: abbreviate loose object names without `disambiguate_state` object-name: merge `update_candidates()` and `match_prefix()` object-name: backend-generic `get_short_oid()` object-name: backend-generic `repo_collect_ambiguous()` object-name: extract function to parse object ID prefixes object-name: move logic to iterate through packed prefixed objects object-name: move logic to iterate through loose prefixed objects odb: introduce `struct odb_for_each_object_options` oidtree: extend iteration to allow for arbitrary return codes oidtree: modernize the code a bit object-file: fix sparse 'plain integer as NULL pointer' error	2026-04-06 15:42:49 -07:00
Taylor Blau	3f7c0e722e	pack-objects: support excluded-open packs with --stdin-packs In `cd846bacc7` (pack-objects: introduce '--stdin-packs=follow', 2025-06-23), pack-objects learned to traverse through commits in included packs when using '--stdin-packs=follow', rescuing reachable objects from unlisted packs into the output. When we encounter a commit in an excluded pack during this rescuing phase we will traverse through its parents. But because we set `revs.no_kept_objects = 1`, commit simplification will prevent us from showing it via `get_revision()`. (In practice, `--stdin-packs=follow` walks commits down to the roots, but only opens up trees for ones that do not appear in an excluded pack.) But there are certain cases where we do need to see the parents of an object in an excluded pack. Namely, if an object is rescue-able, but only reachable from object(s) which appear in excluded packs, then commit simplification will exclude those commits from the object traversal, and we will never see a copy of that object, and thus not rescue it. This is what causes the failure in the previous commit during repacking. When performing a geometric repack, packs above the geometric split that weren't part of the previous MIDX (e.g., packs pushed directly into `$GIT_DIR/objects/pack`) may not have full object closure. When those packs are listed as excluded via the '^' marker, the reachability traversal encounters the sequence described above, and may miss objects which we expect to rescue with `--stdin-packs=follow`. Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed packs, objects from '!'-prefixed packs are excluded from the resulting pack. But unlike '^', commits in '!'-prefixed packs are used as starting points for the follow traversal, and the traversal does not treat them as a closure boundary. In order to distinguish excluded-closed from excluded-open packs during the traversal, introduce a new `pack_keep_in_core_open` bit on `struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN` flag for the kept-pack cache. In `add_object_entry_from_pack()`, move the `want_object_in_pack()` check to after `add_pending_oid()`. This is necessary so that commits from excluded-open packs are added as traversal tips even though their objects won't appear in the output. As a consequence, the caller `for_each_object_in_pack()` will always provide a non-NULL 'p', hence we are able to drop the "if (p)" conditional. The `include_check` and `include_check_obj` callbacks on `rev_info` are used to halt the walk at closed-excluded packs, since objects behind a '^' boundary are guaranteed to have closure and need not be rescued. The following commit will make use of this new functionality within the repack layer to resolve the test failure demonstrated in the previous commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-27 13:40:40 -07:00
Taylor Blau	d31d1f2e06	pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` The '--stdin-packs' mode of pack-objects maintains two separate string_lists: one for included packs, and one for excluded packs. Each list stores the pack basename as a string and the corresponding `packed_git` pointer in its `->util` field. This works, but makes it awkward to extend the set of pack "kinds" that pack-objects can accept via stdin, since each new kind would need its own string_list and duplicated handling. A future commit will want to do just this, so prepare for that change by handling the various "kinds" of packs specified over stdin in a more generic fashion. Namely, replace the two `string_list`s with a single `strmap` keyed on the pack basename, with values pointing to a new `struct stdin_pack_info`. This struct tracks both the `packed_git` pointer and a `kind` bitfield indicating whether the pack was specified as included or excluded. Extract the logic for sorting packs by mtime and adding their objects into a separate `stdin_packs_add_pack_entries()` helper. While we could have used a `string_list`, we must handle the case where the same pack is specified more than once. With a `string_list` only, we would have to pay a quadratic cost to either (a) insert elements into their sorted positions, or (b) a repeated linear search, which is accidentally quadratic. For that reason, use a strmap instead. This patch does not include any functional changes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-27 13:40:39 -07:00
Taylor Blau	81e2906437	pack-objects: plug leak in `read_stdin_packs()` The `read_stdin_packs()` function added originally via `339bce27f4` (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22) declares a `rev_info` struct but neglects to call `release_revisions()` on it before returning, creating the potential for a leak. The related change in `97ec43247c` (pack-objects: declare 'rev_info' for '--stdin-packs' earlier, 2025-06-23) carried forward this oversight and did not address it. Ensure that we call `release_revisions()` appropriately to prevent a potential leak from this function. Note that in practice our `rev_info` here does not have a present leak, hence t5331 passes cleanly before this commit, even when built with SANITIZE=leak. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-27 13:40:39 -07:00
Junio C Hamano	8023abc632	Merge branch 'ps/upload-pack-buffer-more-writes' Reduce system overhead "git upload-pack" spends on relaying "git pack-objects" output to the "git fetch" running on the other end of the connection. * ps/upload-pack-buffer-more-writes: builtin/pack-objects: reduce lock contention when writing packfile data csum-file: drop `hashfd_throughput()` csum-file: introduce `hashfd_ext()` sideband: use writev(3p) to send pktlines wrapper: introduce writev(3p) wrappers compat/posix: introduce writev(3p) wrapper upload-pack: reduce lock contention when writing packfile data upload-pack: prefer flushing data over sending keepalive upload-pack: adapt keepalives based on buffering upload-pack: fix debug statement when flushing packfile data	2026-03-24 12:31:34 -07:00
Patrick Steinhardt	cfd575f0a9	odb: introduce `struct odb_for_each_object_options` The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:41 -07:00
Patrick Steinhardt	835e0aaf6f	builtin/pack-objects: reduce lock contention when writing packfile data When running `git pack-objects --stdout` we feed the data through `hashfd_ext()` with a progress meter and a smaller-than-usual buffer length of 8kB so that we can track throughput more granularly. But as packfiles tend to be on the larger side, this small buffer size may cause a ton of write(3p) syscalls. Originally, the buffer we used in `hashfd()` was 8kB for all use cases. This was changed though in `2ca245f8be` (csum-file.h: increase hashfile buffer size, 2021-05-18) because we noticed that the number of writes can have an impact on performance. So the buffer size was increased to 128kB, which improved performance a bit for some use cases. But the commit didn't touch the buffer size for `hashd_throughput()`. The reasoning here was that callers expect the progress indicator to update frequently, and a larger buffer size would of course reduce the update frequency especially on slow networks. While that is of course true, there was (and still is, even though it's now a call to `hashfd_ext()`) only a single caller of this function in git-pack-objects(1). This command is responsible for writing packfiles, and those packfiles are often on the bigger side. So arguably: - The user won't care about increments of 8kB when packfiles tend to be megabytes or even gigabytes in size. - Reducing the number of syscalls would be even more valuable here than it would be for multi-pack indices, which was the benchmark done in the mentioned commit, as MIDXs are typically significantly smaller than packfiles. - Nowadays, many internet connections should be able to transfer data at a rate significantly higher than 8kB per second. Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`, which translates to ~64kB. This limit was chosen because `git pack-objects --stdout` is most often used when sending packfiles via git-upload-pack(1), where packfile data is chunked into pktlines when using the sideband. Furthermore, most internet connections should have a bandwidth signifcantly higher than 64kB/s, so we'd still be able to observe progress updates at a rate of at least once per second. This change significantly reduces the number of write(3p) syscalls from 355,000 to 44,000 when packing the Linux repository. While this results in a small performance improvement on an otherwise-unused system, this improvement is mostly negligible. More importantly though, it will reduce lock contention in the kernel on an extremely busy system where we have many processes writing data at once. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-13 08:54:15 -07:00

1 2 3 4 5 ...

751 Commits