git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-06-14 05:33:00 -05:00

Author	SHA1	Message	Date
Johannes Schindelin	ae90a3ee5c	Merge 'objects-larger-than-4gb-on-windows-pt2' This is hidden in v2.55.0-rc0's own CI because of an omission in `5ba82911bc` (ci: enable EXPENSIVE for contributor builds, 2026-05-11) which fails to enable EXPENSIVE tests for tags. Due to `7d78d5fc1a` (ci: skip GitHub workflow runs for already-tested commits/trees, 2020-10-08), the CI of `master` is now also mistakenly green because it reuses the tag's CI run to prove that it's solid. This is an evil merge by necessity because `survey.c` needs to adapt to the changed function signatures. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-13 03:11:27 +00:00
Patrick Steinhardt	1b6e48c974	odb/source-packed: drop pointer to "files" parent source Over the last commits we have turned the packfile store into a proper object database source that can be used as a standalone backend. As such, it is no longer necessary to have it coupled to the "files" parent source. Remove the pointer to the owning "files" source so that the "packed" source can be used as a standalone entity. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:40 +09:00
Patrick Steinhardt	386594ecb6	odb/source-packed: wire up `freshen_object()` callback Move `packfile_store_freshen_object()` and from "packfile.c" into "odb/source-packed.c" and wire it up as the `freshen_object()` callback of the "packed" source. Note that this removes the last external caller of `find_pack_entry()` from "packfile.c", which means that we can now make this function static. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	ff8df544c4	odb/source-packed: wire up `find_abbrev_len()` callback Move `packfile_store_find_abbrev_len()` and its associated helpers from "packfile.c" into "odb/source-packed.c" and wire it up as the `find_abbrev_len()` callback of the "packed" source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	684c6b44b8	odb/source-packed: wire up `count_objects()` callback Move `packfile_store_count_objects()` from "packfile.c" into "odb/source-packed.c" and wire it up as the `count_objects()` callback of the "packed" source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	f298de400e	odb/source-packed: wire up `for_each_object()` callback Move `packfile_store_for_each_object()` and its associated helpers from "packfile.c" into "odb/source-packed.c" and wire it up as the `for_each_object()` callback of the "packed" source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	8813b9486d	odb/source-packed: wire up `read_object_stream()` callback Wire up the `read_object_stream()` callback for the packed source and call it in the "files" source via the `odb_source_read_object_stream()` interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	2b13fc913c	odb/source-packed: wire up `read_object_info()` callback Move the logic to read object info from a "packed" source into "odb/source-packed.c" and wire it up as the `read_object_info()` callback. Note that we also move around the supporting `find_pack_entry()`, but we still have to expose it to other callers that exist in "packfile.c". This will be fixed in subsequent commits though, where all callers in "packfile.c" will have been moved into "odb/source-packed.c", and at that point we'll be able to make `find_pack_entry()` file-local again. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	28d9f6c506	packfile: use higher-level interface to implement `has_object_pack()` In `has_object_pack()` we're checking whether a specific object exists as part of a packfile. This is done by calling the low-level function `find_pack_entry()`, but this function will eventually be moved into "odb/source-packed.c" and made file-local. Refactor the code to use `packfile_store_read_object_info()` instead. This refactoring is functionally equivalent as that function will call `find_pack_entry()` itself and then return immediately when it ain't got no object info pointer as parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	09ac75ad6c	odb/source-packed: wire up `reprepare()` callback Move the logic to prepare and reprepare the "packed" source into "odb/source-packed.c" and wire it up as the `reprepare()` callback. Note that "preparing" a source is not yet generic. Eventually, it would probably make sense to turn the existing `reprepare()` callback into a `prepare()` callback with an optional flag to force re-preparing. But this step will be handled in a separate patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	d2af1f62f5	odb/source-packed: wire up `close()` callback Wire up a new `close()` callback for the packed source and call it from the "files" source via the generic `odb_source_close()` interface. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	1eb77914ab	odb/source-packed: start converting to a proper `struct odb_source` Start converting `struct odb_source_packed` into a proper pluggable `struct odb_source` by embedding the base struct and assigning it the new `ODB_SOURCE_PACKED` type. Furthermore, wire up lifecycle management of this source by implementing the `free` callback and taking ownership of the chdir notifications. Note that the packed source is not yet functional as a standalone `struct odb_source`, as it's missing all of the callback implementations. These will be wired up in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	ef25514fe6	odb/source-packed: store pointer to "files" instead of generic source The `struct odb_source_packed` holds a pointer to its owning parent source. The way that Git is currently structured, this parent is always the "files" source. In subsequent commits we're going to detangle that so that the "packed" source doesn't have any owning parent source at all, which makes it usable as a completely standalone source. Detangling this mess is somewhat intricate though, and is made even more intricate because it's not always clear which kind of source one is holding at a specific point in time -- either the parent "files" source, or the child "packed" source. Make this relationship more explicit by storing a pointer to the "files" source instead of storing a pointer to a generic `struct odb_source`. This will help make subsequent steps a bit clearer. Note that this is a temporary step, only. At the end of this series we will have dropped the parent pointer completely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	8742ff8368	packfile: move packed source into "odb/" subsystem In subsequent patches we'll be turning `struct odb_source_packed` into a proper `struct odb_source`. As a first step towards this goal, move its struct out of "packfile.{c,h}" and into "odb/source-packed.{c,h}". This detaches the implementation of the packfile object source from the generic packfile code, following the same convention already used by the "files" and "in-memory" sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:39 +09:00
Patrick Steinhardt	3704b5116a	packfile: split out packfile list logic In the next commit we're about to introduce the "packed" object database source. This source will embed a packfile list, and consequently we'll have to include "packfile.h" to make the struct definition available. This will unfortunately lead to a cyclic dependency that we cannot resolve with a forward declaration. Split out the code that relates to the packfile list into a separate compilation unit so that both "packfile.h" and "odb/source-packed.h" can include it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:38 +09:00
Patrick Steinhardt	9bc897f31d	packfile: rename `struct packfile_store` to `odb_source_packed` Not too long ago, we have introduced the packfile store in `b7983adb51` (packfile: introduce a new `struct packfile_store`, 2025-09-23). This struct is responsible for managing all of our access to packfiles and is used as one of the two sources of objects for the "files" source. Back when I introduced this structure I didn't have the clear vision yet that it will eventually also turn into a proper object database source, and how exactly that infrastructure will look like. Now though it's becoming increasingly clear that it does make sense to treat it just the same as any of our other ODB sources. The consequence is that the naming is now a bit out-of-date: it's just another source and will be turned into a proper `struct odb_source` over the next couple of commits, but it's not named accordingly. Rename the structure to `odb_source_packed` to align it with this goal and to bring it in line with the other sources we already have. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-10 00:59:38 +09:00
Johannes Schindelin	f3aeae983a	odb: use size_t for object_info.sizep and the size APIs When `js/objects-larger-than-4gb-on-windows` widened the streaming, index-pack and unpack-objects code paths, in the interest of keeping the patches somewhat reasonably-sized, it left the public ODB API still typed in `unsigned long`. In particular `struct object_info::sizep` and the four wrappers built on top of it (`odb_read_object`, `odb_read_object_peeled`, `odb_read_object_info`, `odb_pretend_object`) still return the unpacked size through `unsigned long *`, so on Windows `cat-file -s` and the `git add` / `git status` paths for a >4 GiB blob silently cap at 4 GiB. Widen the field and the four wrappers. The previous commits already widened the `unpack_entry()` cascade and pack-objects' in-core size accessors, so most of the cascade arrives here with no further work: the temporary shims in `packed_object_info_with_index_pos()` and in `unpack_entry()`'s delta-base recovery path go away, the two `SET_SIZE(entry, cast_size_t_to_ulong(canonical_size))` calls in `check_object()` and the matching one in `drop_reused_delta()` collapse to plain `SET_SIZE`, and `oe_get_size_slow()`'s tail `cast_size_t_to_ulong()` is gone too. What remains narrow are the boundaries this series does not intend to touch: the diff, blame, textconv and fast-import machinery. Even so, this patch is unfortunately quite large. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-04 09:51:22 +02:00
Johannes Schindelin	460d733fee	packfile,delta: drop the `cast_size_t_to_ulong()` wrappers When I started the transition from `unsigned long` to `size_t`, in the interest of keeping the patches reviewable, I introduced these calls to prevent data type narrowing from silently failing to handle large object sizes. I also introduced `*_sz()` variants that would allow most of the callers to keep using that `unsigned long` that the 90s kindly asked to be returned. After the preceding commits, the only places that called the narrow wrappers either no longer exist or already use the `_sz` form internally, so the wrappers just narrow values back through `cast_size_t_to_ulong()` for no reason. Drop them and rename the `_sz` variants back to the natural names. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-04 09:39:28 +02:00
Johannes Schindelin	bdebc36f21	packfile: widen unpack_entry()'s size out-parameter to size_t The topic `js/objects-larger-than-4gb-on-windows` widened the streaming, index-pack and unpack-objects paths to `size_t` but deliberately stopped at the in-memory `unpack_entry()` cascade, which still hands back the unpacked size through `unsigned long `. On Windows that boundary truncates above 4 GiB because that data type is only 32 bits wide on that platform. Widen the code path. Except `packed_object_info_with_index_pos()`: It cannot yet pass `oi->sizep` directly because the field is still `unsigned long `; bridge it with a `size_t` temporary that narrows back, and let a later commit drop the bridge once the field is wide too. `gfi_unpack_entry()` keeps its narrow signature because fast-import tracks sizes through `unsigned long` everywhere it crosses subsystem boundaries, keeping its signature allows the scope of this commit to be somewhat reasonable, still. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-04 09:39:28 +02:00
Johannes Schindelin	1fd7646ca1	patch-delta: use size_t for sizes `patch_delta()` takes the source and delta sizes by value and writes back the reconstructed target size through an `unsigned long *`. That datatype cannot represent a value that exceeds 4 GiB on systems where `unsigned long` is 32-bit (notably 64-bit Windows builds), though, even though the delta encoding itself, the on-disk layout, and the in-memory buffers happily carry such sizes. A `size_t` companion to `get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in `17fa077596` (delta, packfile: use size_t for delta header sizes, 2026-05-08) precisely so that `patch_delta()` could be widened without changing the on-the-wire decoding helper's signature. Widen `patch_delta()`'s three size parameters to `size_t` and switch its internal use of `get_delta_hdr_size()` to the `_sz` variant. Then propagate the wider type through the callers. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-04 09:39:27 +02:00
Junio C Hamano	8b5873a1f2	Merge branch 'tb/incremental-midx-part-3.3' The repacking code has been refactored and compaction of MIDX layers have been implemented, and incremental strategy that does not require all-into-one repacking has been introduced. * tb/incremental-midx-part-3.3: repack: allow `--write-midx=incremental` without `--geometric` repack: introduce `--write-midx=incremental` repack: implement incremental MIDX repacking packfile: ensure `close_pack_revindex()` frees in-memory revindex builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK` repack-geometry: prepare for incremental MIDX repacking repack-midx: extract `repack_fill_midx_stdin_packs()` repack-midx: factor out `repack_prepare_midx_command()` midx: expose `midx_layer_contains_pack()` repack: track the ODB source via existing_packs midx: support custom `--base` for incremental MIDX writes midx: introduce `--no-write-chain-file` for incremental MIDX writes midx: use `strvec` for `keep_hashes` midx: build `keep_hashes` array in order midx: use `strset` for retained MIDX files midx-write: handle noop writes when converting incremental chains	2026-05-27 14:15:45 +09:00
Taylor Blau	b0d6e7b0d0	packfile: ensure `close_pack_revindex()` frees in-memory revindex The following commit will introduce a case where we write a MIDX bitmap over packs that do not themselves have on-disk *.rev files. This case is supported within Git, and we will simply fall back to generating the revindex in memory. But we don't ever release that memory, causing a leak that is exposed by a test introduced in the following commit. (As far as I could find, we never free()'d memory allocated as a byproduct of creating an in-memory revindex, likely because that code predates the leak-checking niceties we have in the test suite now.) Rectify this by calling `FREE_AND_NULL()` on the `p->revindex` field when calling `close_pack_revindex()`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-20 11:31:14 +09:00
Johannes Schindelin	17fa077596	delta, packfile: use size_t for delta header sizes The delta header decoding functions return unsigned long, which truncates on Windows for objects larger than 4GB. Introduce size_t variants get_delta_hdr_size_sz() and get_size_from_delta_sz() that preserve the full 64-bit size, and use them in packed_object_info() where the size is needed for streaming decisions. This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:32 +09:00
Johannes Schindelin	606c192380	odb, packfile: use size_t for streaming object sizes The odb_read_stream structure uses unsigned long for the size field, which is 32-bit on Windows even in 64-bit builds. When streaming objects larger than 4GB, the size would be truncated to zero or an incorrect value, resulting in empty files being written to disk. Change the size field in odb_read_stream to size_t and introduce unpack_object_header_sz() to return sizes via size_t pointer. Since object_info.sizep remains unsigned long for API compatibility, use temporary variables where the types differ, with comments noting the truncation limitation for code paths that still use unsigned long. Widening the producers to size_t in this way introduces a handful of silent size_t -> unsigned long narrowings on Windows, all in builtin/pack-objects.c, where the consumers are still typed unsigned long. Make those narrowings explicit with cast_size_t_to_ulong() so they assert loudly the moment an object actually exceeds ULONG_MAX bytes: - oe_get_size_slow() returns unsigned long but holds a size_t locally; cast at the return. - write_reuse_object() passes a size_t into check_pack_inflate(), whose expect parameter is unsigned long; cast at the call. - check_object() routes a size_t through SET_SIZE() and SET_DELTA_SIZE(), both of which take unsigned long via oe_set_size() / oe_set_delta_size(); cast at the three call sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the non-delta default arm. The cast-only treatment is deliberately a stop-gap. Properly widening oe_set_size, oe_get_size_slow's return type, check_pack_inflate's expect parameter, object_info.sizep, patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series that is too large to be reviewable, so the proper widening is deferred to a follow-up topic. Until then, cast_size_t_to_ulong() at least makes the truncation explicit at the source: it documents the boundary, and on a 64-bit non-Windows platform it is a no-op. This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:31 +09:00
Junio C Hamano	9797fed6ce	Merge branch 'ps/odb-cleanup' Various code clean-up around odb subsystem. * ps/odb-cleanup: odb: drop unneeded headers and forward decls odb: rename `odb_has_object()` flags odb: use enum for `odb_write_object` flags odb: rename `odb_write_object()` flags treewide: use enum for `odb_for_each_object()` flags CodingGuidelines: document our style for flags	2026-04-08 10:19:17 -07:00
Junio C Hamano	03311dca7f	Merge branch 'tb/stdin-packs-excluded-but-open' pack-objects's --stdin-packs=follow mode learns to handle excluded-but-open packs. * tb/stdin-packs-excluded-but-open: repack: mark non-MIDX packs above the split as excluded-open pack-objects: support excluded-open packs with --stdin-packs t7704: demonstrate failure with once-cruft objects above the geometric split pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` pack-objects: plug leak in `read_stdin_packs()`	2026-04-06 15:42:49 -07:00
Patrick Steinhardt	75c702624d	treewide: use enum for `odb_for_each_object()` flags We've got a couple of callsites where we pass `odb_for_each_object()` flags, but accept an `unsigned` flags field instead of the corresponding enum. Adapt these to accept the enum type instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:13 -07:00
Taylor Blau	3f7c0e722e	pack-objects: support excluded-open packs with --stdin-packs In `cd846bacc7` (pack-objects: introduce '--stdin-packs=follow', 2025-06-23), pack-objects learned to traverse through commits in included packs when using '--stdin-packs=follow', rescuing reachable objects from unlisted packs into the output. When we encounter a commit in an excluded pack during this rescuing phase we will traverse through its parents. But because we set `revs.no_kept_objects = 1`, commit simplification will prevent us from showing it via `get_revision()`. (In practice, `--stdin-packs=follow` walks commits down to the roots, but only opens up trees for ones that do not appear in an excluded pack.) But there are certain cases where we do need to see the parents of an object in an excluded pack. Namely, if an object is rescue-able, but only reachable from object(s) which appear in excluded packs, then commit simplification will exclude those commits from the object traversal, and we will never see a copy of that object, and thus not rescue it. This is what causes the failure in the previous commit during repacking. When performing a geometric repack, packs above the geometric split that weren't part of the previous MIDX (e.g., packs pushed directly into `$GIT_DIR/objects/pack`) may not have full object closure. When those packs are listed as excluded via the '^' marker, the reachability traversal encounters the sequence described above, and may miss objects which we expect to rescue with `--stdin-packs=follow`. Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed packs, objects from '!'-prefixed packs are excluded from the resulting pack. But unlike '^', commits in '!'-prefixed packs are used as starting points for the follow traversal, and the traversal does not treat them as a closure boundary. In order to distinguish excluded-closed from excluded-open packs during the traversal, introduce a new `pack_keep_in_core_open` bit on `struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN` flag for the kept-pack cache. In `add_object_entry_from_pack()`, move the `want_object_in_pack()` check to after `add_pending_oid()`. This is necessary so that commits from excluded-open packs are added as traversal tips even though their objects won't appear in the output. As a consequence, the caller `for_each_object_in_pack()` will always provide a non-NULL 'p', hence we are able to drop the "if (p)" conditional. The `include_check` and `include_check_obj` callbacks on `rev_info` are used to halt the walk at closed-excluded packs, since objects behind a '^' boundary are guaranteed to have closure and need not be rescued. The following commit will make use of this new functionality within the repack layer to resolve the test failure demonstrated in the previous commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-27 13:40:40 -07:00
Patrick Steinhardt	6c2ede6e4a	object-file: move logic to compute packed abbreviation length Same as the preceding commit, move the logic that computes the minimum required prefix length to make a given object ID unique for the packfile store into a new function `packfile_store_find_abbrev_len()` that is part of "packfile.c". This prepares for making the logic fully generic via pluggable object databases. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	e30bff8f84	object-name: move logic to iterate through packed prefixed objects Similar to the preceding commit, move the logic to iterate through objects that have a given prefix into "packfile.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	cfd575f0a9	odb: introduce `struct odb_for_each_object_options` The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:41 -07:00
Junio C Hamano	c89a495ce4	Merge branch 'ps/odb-sources' The object source API is getting restructured to allow plugging new backends. * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header	2026-03-12 14:09:07 -07:00
Patrick Steinhardt	6801ffd37d	odb: introduce generic object counting Similar to the preceding commit, introduce counting of objects on the object database level, replacing the logic that we have in `repo_approximate_object_count()`. Note that the function knows to cache the object count. It's unclear whether this cache is really required as we shouldn't have that many cases where we count objects repeatedly. But to be on the safe side the caching mechanism is retained, with the only excepting being that we also have to use the passed flags as caching key. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:38:43 -07:00
Patrick Steinhardt	b259f2175b	odb/source: introduce generic object counting Introduce generic object counting on the object database source level with a new backend-specific callback function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:38:42 -07:00
Patrick Steinhardt	dd587cd59e	packfile: extract logic to count number of objects In a subsequent commit we're about to introduce a new `odb_source_count_objects()` function so that we can make the logic pluggable. Prepare for this change by extracting the logic that we have to count packed objects into a standalone function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:38:42 -07:00
Junio C Hamano	6cdef943d2	Merge branch 'ps/odb-sources' into ps/object-counting * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header	2026-03-10 10:13:40 -07:00
Patrick Steinhardt	5946a564cd	odb/source: make `read_object_info()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Note that this function is a bit less straight-forward to convert compared to the other functions. The reason here is that the logic to read an object is: 1. We try to read the object. If it exists we return it. 2. If the object does not exist we reprepare the object database source. 3. We then try reading the object info a second time in case the reprepare caused it to appear. The second read is only supposed to happen for the packfile store though, as reading loose objects is not impacted by repreparing the object database. Ideally, we'd just move this whole logic into the ODB source. But that's not easily possible because we try to avoid the reprepare unless really required, which is after we have found out that no other ODB source contains the object, either. So the logic spans across multiple ODB sources, and consequently we cannot move it into an individual source. Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the backend that we already tried to look up the object once, and that this time around the ODB source should try to find any new objects that may have surfaced due to an on-disk change. With this flag, the "files" backend can trivially skip trying to re-read the object as a loose object. Furthermore, as we know that we only try the second read via the packfile store, we can skip repreparing loose objects and only reprepare the packfile store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:15 -08:00
Patrick Steinhardt	d9ecf268ef	odb: embed base source in the "files" backend The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:15 -08:00
Patrick Steinhardt	cb506a8a69	odb: introduce "files" source Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:14 -08:00
Junio C Hamano	d93be9cbca	Merge branch 'ps/fsck-stream-from-the-right-object-instance' "fsck" iterates over packfiles and its access to pack data caused the list to be permuted, which caused it to loop forever; the code to access pack data by "fsck" has been updated to avoid this. * ps/fsck-stream-from-the-right-object-instance: pack-check: fix verification of large objects packfile: expose function to read object stream for an offset object-file: adapt `stream_object_signature()` to take a stream t/helper: improve "genrandom" test helper	2026-03-05 10:04:49 -08:00
Junio C Hamano	10dd04a3fe	Merge branch 'ps/object-info-bits-cleanup' A couple of bugs in use of flag bits around odb API has been corrected, and the flag bits reordered. * ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`	2026-03-02 17:06:52 -08:00
Junio C Hamano	b1af291b4a	Merge branch 'ps/object-info-bits-cleanup' into ps/odb-sources * ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`	2026-02-23 13:48:48 -08:00
Patrick Steinhardt	41b42e3527	packfile: expose function to read object stream for an offset The function `packfile_store_read_object_stream()` takes as input an object ID and then constructs a `struct odb_read_stream` from it. In a subsequent commit we'll want to create an object stream for a given combination of packfile and offset though, which is not something that can currently be done. Extract a new function `packfile_read_object_stream()` that makes this functionality available. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:19:00 -08:00
Patrick Steinhardt	f6516a5241	odb: convert object info flags into an enum Convert the object info flags into an enum and adapt all functions that receive these flags as parameters to use the enum instead of an integer. This serves two purposes: - The function signatures become more self-documenting, as callers don't have to wonder which flags they expect. - The compiler can warn when a wrong flag type is passed. Note that the second benefit is somewhat limited. For example, when or-ing multiple enum flags together the result will be an integer, and the compiler will not warn about such use cases. But where it does help is when a single flag of the wrong type is passed, as the compiler would generate a warning in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-12 11:05:08 -08:00
Patrick Steinhardt	3565faf28c	odb: drop unused `for_each_{loose,packed}_object()` functions We have converted all callers of `for_each_loose_object()` and `for_each_packed_object()` to use their new replacement functions instead. We can thus remove them now. Do so and inline `packfile_store_for_each_object_internal()` now that it only has a single callsite again. This makes it a bit easier to follow the callback indirection that is happening there. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:08 -08:00
Patrick Steinhardt	7b7cbaef27	odb: introduce mtime fields for object info requests There are some use cases where we need to figure out the mtime for objects. Most importantly, this is the case when we want to prune unreachable objects. But getting at that data requires users to manually derive the info either via the loose object's mtime, the packfiles' mtime or via the ".mtimes" file. Introduce a new `struct object_info::mtimep` pointer that allows callers to request an object's mtime. This new field will be used in a subsequent commit. Note that the concept of "mtime" is ambiguous: given an object, it may be stored multiple times in the object database, and each of these instances may have a different mtime. Disambiguating these mtimes is nothing that can happen on the generic ODB layer: the caller may search for the oldest object, the newest object, or even the relation of object mtimes depending on the specific source they are located in. As such, it is the responsibility of the caller to disambiguate mtimes. A consequence of this is that it's most likely incorrect to look up the mtime via `odb_read_object_info()`, as this interface does not give us enough information to disambiguate the mtime. Document this accordingly and tell users to use `odb_for_each_object()` instead. Even with this gotcha though it's sensible to have this request as part of the object info, as the mtime is a property of the object storage format. If we for example had a "black-box" storage backend, we'd still need to be able to query it for the mtime info in a generic way. We could introduce a safety mechanism that for example calls `BUG()` in case we look up the mtime outside of `odb_for_each_object()`. But that feels somewhat heavy-handed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:08 -08:00
Patrick Steinhardt	2813c97310	treewide: enumerate promisor objects via `odb_for_each_object()` We have multiple callsites where we enumerate all promisor objects in the object database via `for_each_packed_object()`. This is done by passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to skip over all non-promisor objects. These callsites can be trivially converted to `odb_for_each_object()` as we know to skip enumeration of loose objects in case the `PROMISOR_ONLY` flag was passed by the caller. Refactor the sites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:07 -08:00
Patrick Steinhardt	736464b84f	packfile: introduce function to iterate through objects Introduce a new function `packfile_store_for_each_object()`. This function is equivalent to `odb_source_loose_for_each_object()`, except that it: - Works on a single packfile store instead of working on the object database level. Consequently, it will only yield packed objects of a single object database source. - Passes a `struct object_info` to the callback function. As such, it provides the same callback interface as we already provide for loose objects now. These functions will be used in a subsequent step to implement `odb_for_each_object()`. The `for_each_packed_object()` function continues to exist for now, but it will be removed at the end of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:07 -08:00
Patrick Steinhardt	3735311904	packfile: extract function to iterate through objects of a store In the next commit we're about to introduce a new function that knows to iterate through objects of a given packfile store. Same as with the equivalent function for loose objects, this new function will also be agnostic of backends by using a `struct object_info`. Prepare for this by extracting a new shared function to iterate through a single packfile store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:07 -08:00
Patrick Steinhardt	6358da200f	odb: fix flags parameter to be unsigned The `flags` parameter accepted by various `for_each_object()` functions is a bitfield of multiple flags. Such parameters are typically unsigned in the Git codebase, but we use `enum odb_for_each_object_flags` in some places. Adapt these function signatures to use the correct type. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:06 -08:00

1 2 3 4 5 ...

433 Commits