Commit Graph

188 Commits

Author SHA1 Message Date
Johannes Schindelin
ae90a3ee5c Merge 'objects-larger-than-4gb-on-windows-pt2'
This is hidden in v2.55.0-rc0's own CI because of an omission in
5ba82911bc (ci: enable EXPENSIVE for contributor builds, 2026-05-11)
which fails to enable EXPENSIVE tests for tags.

Due to 7d78d5fc1a (ci: skip GitHub workflow runs for already-tested
commits/trees, 2020-10-08), the CI of `master` is now also mistakenly
green because it reuses the tag's CI run to prove that it's solid.

This is an evil merge by necessity because `survey.c` needs to adapt to
the changed function signatures.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-13 03:11:27 +00:00
Patrick Steinhardt
386594ecb6 odb/source-packed: wire up freshen_object() callback
Move `packfile_store_freshen_object()` and from "packfile.c" into
"odb/source-packed.c" and wire it up as the `freshen_object()` callback
of the "packed" source.

Note that this removes the last external caller of `find_pack_entry()`
from "packfile.c", which means that we can now make this function
static.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
ff8df544c4 odb/source-packed: wire up find_abbrev_len() callback
Move `packfile_store_find_abbrev_len()` and its associated helpers from
"packfile.c" into "odb/source-packed.c" and wire it up as the
`find_abbrev_len()` callback of the "packed" source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
684c6b44b8 odb/source-packed: wire up count_objects() callback
Move `packfile_store_count_objects()` from "packfile.c" into
"odb/source-packed.c" and wire it up as the `count_objects()` callback
of the "packed" source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
f298de400e odb/source-packed: wire up for_each_object() callback
Move `packfile_store_for_each_object()` and its associated helpers from
"packfile.c" into "odb/source-packed.c" and wire it up as the
`for_each_object()` callback of the "packed" source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
8813b9486d odb/source-packed: wire up read_object_stream() callback
Wire up the `read_object_stream()` callback for the packed source and
call it in the "files" source via the `odb_source_read_object_stream()`
interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
2b13fc913c odb/source-packed: wire up read_object_info() callback
Move the logic to read object info from a "packed" source into
"odb/source-packed.c" and wire it up as the `read_object_info()`
callback.

Note that we also move around the supporting `find_pack_entry()`, but we
still have to expose it to other callers that exist in "packfile.c".
This will be fixed in subsequent commits though, where all callers in
"packfile.c" will have been moved into "odb/source-packed.c", and at
that point we'll be able to make `find_pack_entry()` file-local again.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
09ac75ad6c odb/source-packed: wire up reprepare() callback
Move the logic to prepare and reprepare the "packed" source into
"odb/source-packed.c" and wire it up as the `reprepare()` callback.

Note that "preparing" a source is not yet generic. Eventually, it would
probably make sense to turn the existing `reprepare()` callback into a
`prepare()` callback with an optional flag to force re-preparing. But
this step will be handled in a separate patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
d2af1f62f5 odb/source-packed: wire up close() callback
Wire up a new `close()` callback for the packed source and call it from
the "files" source via the generic `odb_source_close()` interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
1eb77914ab odb/source-packed: start converting to a proper struct odb_source
Start converting `struct odb_source_packed` into a proper pluggable
`struct odb_source` by embedding the base struct and assigning it the
new `ODB_SOURCE_PACKED` type. Furthermore, wire up lifecycle management
of this source by implementing the `free` callback and taking ownership
of the chdir notifications.

Note that the packed source is not yet functional as a standalone `struct
odb_source`, as it's missing all of the callback implementations. These
will be wired up in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
8742ff8368 packfile: move packed source into "odb/" subsystem
In subsequent patches we'll be turning `struct odb_source_packed` into a
proper `struct odb_source`. As a first step towards this goal, move its
struct out of "packfile.{c,h}" and into "odb/source-packed.{c,h}".

This detaches the implementation of the packfile object source from the
generic packfile code, following the same convention already used by the
"files" and "in-memory" sources.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:39 +09:00
Patrick Steinhardt
3704b5116a packfile: split out packfile list logic
In the next commit we're about to introduce the "packed" object database
source. This source will embed a packfile list, and consequently we'll
have to include "packfile.h" to make the struct definition available.
This will unfortunately lead to a cyclic dependency that we cannot
resolve with a forward declaration.

Split out the code that relates to the packfile list into a separate
compilation unit so that both "packfile.h" and "odb/source-packed.h" can
include it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:38 +09:00
Patrick Steinhardt
9bc897f31d packfile: rename struct packfile_store to odb_source_packed
Not too long ago, we have introduced the packfile store in b7983adb51
(packfile: introduce a new `struct packfile_store`, 2025-09-23). This
struct is responsible for managing all of our access to packfiles and is
used as one of the two sources of objects for the "files" source.

Back when I introduced this structure I didn't have the clear vision yet
that it will eventually also turn into a proper object database source,
and how exactly that infrastructure will look like. Now though it's
becoming increasingly clear that it does make sense to treat it just the
same as any of our other ODB sources.

The consequence is that the naming is now a bit out-of-date: it's just
another source and will be turned into a proper `struct odb_source` over
the next couple of commits, but it's not named accordingly.

Rename the structure to `odb_source_packed` to align it with this goal
and to bring it in line with the other sources we already have.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-10 00:59:38 +09:00
Johannes Schindelin
460d733fee packfile,delta: drop the cast_size_t_to_ulong() wrappers
When I started the transition from `unsigned long` to `size_t`, in the
interest of keeping the patches reviewable, I introduced these calls to
prevent data type narrowing from silently failing to handle large object
sizes. I also introduced `*_sz()` variants that would allow most of the
callers to keep using that `unsigned long` that the 90s kindly asked to
be returned.

After the preceding commits, the only places that called the narrow
wrappers either no longer exist or already use the `_sz` form
internally, so the wrappers just narrow values back through
`cast_size_t_to_ulong()` for no reason.

Drop them and rename the `_sz` variants back to the natural names.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-04 09:39:28 +02:00
Johannes Schindelin
bdebc36f21 packfile: widen unpack_entry()'s size out-parameter to size_t
The topic `js/objects-larger-than-4gb-on-windows` widened the streaming,
index-pack and unpack-objects paths to `size_t` but deliberately stopped
at the in-memory `unpack_entry()` cascade, which still hands back the
unpacked size through `unsigned long *`.  On Windows that boundary
truncates above 4 GiB because that data type is only 32 bits wide on
that platform.

Widen the code path. Except `packed_object_info_with_index_pos()`: It
cannot yet pass `oi->sizep` directly because the field is still
`unsigned long *`; bridge it with a `size_t` temporary that narrows
back, and let a later commit drop the bridge once the field is wide
too. `gfi_unpack_entry()` keeps its narrow signature because fast-import
tracks sizes through `unsigned long` everywhere it crosses subsystem
boundaries, keeping its signature allows the scope of this commit to be
somewhat reasonable, still.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-04 09:39:28 +02:00
Johannes Schindelin
606c192380 odb, packfile: use size_t for streaming object sizes
The odb_read_stream structure uses unsigned long for the size field,
which is 32-bit on Windows even in 64-bit builds. When streaming
objects larger than 4GB, the size would be truncated to zero or an
incorrect value, resulting in empty files being written to disk.

Change the size field in odb_read_stream to size_t and introduce
unpack_object_header_sz() to return sizes via size_t pointer. Since
object_info.sizep remains unsigned long for API compatibility, use
temporary variables where the types differ, with comments noting the
truncation limitation for code paths that still use unsigned long.

Widening the producers to size_t in this way introduces a handful of
silent size_t -> unsigned long narrowings on Windows, all in
builtin/pack-objects.c, where the consumers are still typed
unsigned long. Make those narrowings explicit with
cast_size_t_to_ulong() so they assert loudly the moment an object
actually exceeds ULONG_MAX bytes:

  - oe_get_size_slow() returns unsigned long but holds a size_t
    locally; cast at the return.
  - write_reuse_object() passes a size_t into check_pack_inflate(),
    whose expect parameter is unsigned long; cast at the call.
  - check_object() routes a size_t through SET_SIZE() and
    SET_DELTA_SIZE(), both of which take unsigned long via
    oe_set_size() / oe_set_delta_size(); cast at the three call
    sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the
    non-delta default arm.

The cast-only treatment is deliberately a stop-gap. Properly
widening oe_set_size, oe_get_size_slow's return type,
check_pack_inflate's expect parameter, object_info.sizep,
patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series
that is too large to be reviewable, so the proper widening is
deferred to a follow-up topic. Until then,
cast_size_t_to_ulong() at least makes the truncation explicit at
the source: it documents the boundary, and on a 64-bit non-Windows
platform it is a no-op.

This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-05-09 11:25:31 +09:00
Junio C Hamano
9797fed6ce Merge branch 'ps/odb-cleanup'
Various code clean-up around odb subsystem.

* ps/odb-cleanup:
  odb: drop unneeded headers and forward decls
  odb: rename `odb_has_object()` flags
  odb: use enum for `odb_write_object` flags
  odb: rename `odb_write_object()` flags
  treewide: use enum for `odb_for_each_object()` flags
  CodingGuidelines: document our style for flags
2026-04-08 10:19:17 -07:00
Junio C Hamano
03311dca7f Merge branch 'tb/stdin-packs-excluded-but-open'
pack-objects's --stdin-packs=follow mode learns to handle
excluded-but-open packs.

* tb/stdin-packs-excluded-but-open:
  repack: mark non-MIDX packs above the split as excluded-open
  pack-objects: support excluded-open packs with --stdin-packs
  t7704: demonstrate failure with once-cruft objects above the geometric split
  pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
  pack-objects: plug leak in `read_stdin_packs()`
2026-04-06 15:42:49 -07:00
Patrick Steinhardt
75c702624d treewide: use enum for odb_for_each_object() flags
We've got a couple of callsites where we pass `odb_for_each_object()`
flags, but accept an `unsigned` flags field instead of the corresponding
enum. Adapt these to accept the enum type instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-31 20:43:13 -07:00
Taylor Blau
3f7c0e722e pack-objects: support excluded-open packs with --stdin-packs
In cd846bacc7 (pack-objects: introduce '--stdin-packs=follow',
2025-06-23), pack-objects learned to traverse through commits in
included packs when using '--stdin-packs=follow', rescuing reachable
objects from unlisted packs into the output.

When we encounter a commit in an excluded pack during this rescuing
phase we will traverse through its parents. But because we set
`revs.no_kept_objects = 1`, commit simplification will prevent us from
showing it via `get_revision()`. (In practice, `--stdin-packs=follow`
walks commits down to the roots, but only opens up trees for ones that
do not appear in an excluded pack.)

But there are certain cases where we *do* need to see the parents of an
object in an excluded pack. Namely, if an object is rescue-able, but
only reachable from object(s) which appear in excluded packs, then
commit simplification will exclude those commits from the object
traversal, and we will never see a copy of that object, and thus not
rescue it.

This is what causes the failure in the previous commit during repacking.
When performing a geometric repack, packs above the geometric split that
weren't part of the previous MIDX (e.g., packs pushed directly into
`$GIT_DIR/objects/pack`) may not have full object closure.  When those
packs are listed as excluded via the '^' marker, the reachability
traversal encounters the sequence described above, and may miss objects
which we expect to rescue with `--stdin-packs=follow`.

Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed
packs, objects from '!'-prefixed packs are excluded from the resulting
pack. But unlike '^', commits in '!'-prefixed packs *are* used as
starting points for the follow traversal, and the traversal does not
treat them as a closure boundary.

In order to distinguish excluded-closed from excluded-open packs during
the traversal, introduce a new `pack_keep_in_core_open` bit on
`struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN`
flag for the kept-pack cache.

In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
check to *after* `add_pending_oid()`. This is necessary so that commits
from excluded-open packs are added as traversal tips even though their
objects won't appear in the output. As a consequence, the caller
`for_each_object_in_pack()` will always provide a non-NULL 'p', hence we
are able to drop the "if (p)" conditional.

The `include_check` and `include_check_obj` callbacks on `rev_info` are
used to halt the walk at closed-excluded packs, since objects behind a
'^' boundary are guaranteed to have closure and need not be rescued.

The following commit will make use of this new functionality within the
repack layer to resolve the test failure demonstrated in the previous
commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-27 13:40:40 -07:00
Patrick Steinhardt
6c2ede6e4a object-file: move logic to compute packed abbreviation length
Same as the preceding commit, move the logic that computes the minimum
required prefix length to make a given object ID unique for the packfile
store into a new function `packfile_store_find_abbrev_len()` that is
part of "packfile.c". This prepares for making the logic fully generic
via pluggable object databases.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20 13:16:42 -07:00
Patrick Steinhardt
cfd575f0a9 odb: introduce struct odb_for_each_object_options
The `odb_for_each_object()` function only accepts a bitset of flags. In
a subsequent commit we'll want to change object iteration to also
support iterating over only those objects that have a specific prefix.
While we could of course add the prefix to the function signature, or
alternatively introduce a new function, both of these options don't
really seem to be that sensible.

Instead, introduce a new `struct odb_for_each_object_options` that can
be passed to a new `odb_for_each_object_ext()` function. Splice through
the options structure into the respective object database sources.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20 13:16:41 -07:00
Junio C Hamano
c89a495ce4 Merge branch 'ps/odb-sources'
The object source API is getting restructured to allow plugging new
backends.

* ps/odb-sources:
  odb/source: make `begin_transaction()` function pluggable
  odb/source: make `write_alternate()` function pluggable
  odb/source: make `read_alternates()` function pluggable
  odb/source: make `write_object_stream()` function pluggable
  odb/source: make `write_object()` function pluggable
  odb/source: make `freshen_object()` function pluggable
  odb/source: make `for_each_object()` function pluggable
  odb/source: make `read_object_stream()` function pluggable
  odb/source: make `read_object_info()` function pluggable
  odb/source: make `close()` function pluggable
  odb/source: make `reprepare()` function pluggable
  odb/source: make `free()` function pluggable
  odb/source: introduce source type for robustness
  odb: move reparenting logic into respective subsystems
  odb: embed base source in the "files" backend
  odb: introduce "files" source
  odb: split `struct odb_source` into separate header
2026-03-12 14:09:07 -07:00
Patrick Steinhardt
6801ffd37d odb: introduce generic object counting
Similar to the preceding commit, introduce counting of objects on the
object database level, replacing the logic that we have in
`repo_approximate_object_count()`.

Note that the function knows to cache the object count. It's unclear
whether this cache is really required as we shouldn't have that many
cases where we count objects repeatedly. But to be on the safe side the
caching mechanism is retained, with the only excepting being that we
also have to use the passed flags as caching key.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-12 08:38:43 -07:00
Patrick Steinhardt
b259f2175b odb/source: introduce generic object counting
Introduce generic object counting on the object database source level
with a new backend-specific callback function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-12 08:38:42 -07:00
Patrick Steinhardt
dd587cd59e packfile: extract logic to count number of objects
In a subsequent commit we're about to introduce a new
`odb_source_count_objects()` function so that we can make the logic
pluggable. Prepare for this change by extracting the logic that we have
to count packed objects into a standalone function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-12 08:38:42 -07:00
Junio C Hamano
6cdef943d2 Merge branch 'ps/odb-sources' into ps/object-counting
* ps/odb-sources:
  odb/source: make `begin_transaction()` function pluggable
  odb/source: make `write_alternate()` function pluggable
  odb/source: make `read_alternates()` function pluggable
  odb/source: make `write_object_stream()` function pluggable
  odb/source: make `write_object()` function pluggable
  odb/source: make `freshen_object()` function pluggable
  odb/source: make `for_each_object()` function pluggable
  odb/source: make `read_object_stream()` function pluggable
  odb/source: make `read_object_info()` function pluggable
  odb/source: make `close()` function pluggable
  odb/source: make `reprepare()` function pluggable
  odb/source: make `free()` function pluggable
  odb/source: introduce source type for robustness
  odb: move reparenting logic into respective subsystems
  odb: embed base source in the "files" backend
  odb: introduce "files" source
  odb: split `struct odb_source` into separate header
2026-03-10 10:13:40 -07:00
Patrick Steinhardt
d9ecf268ef odb: embed base source in the "files" backend
The "files" backend is implemented as a pointer in the `struct
odb_source`. This contradicts our typical pattern for pluggable backends
like we use it for example in the ref store or for object database
streams, where we typically embed the generic base structure in the
specialized implementation. This pattern has a couple of small benefits:

  - We avoid an extra allocation.

  - We hide implementation details in the generic structure.

  - We can easily downcast from a generic backend to the specialized
    structure and vice versa because the offsets are known at compile
    time.

  - It becomes trivial to identify locations where we depend on backend
    specific logic because the cast needs to be explicit.

Refactor our "files" object database source to do the same and embed the
`struct odb_source` in the `struct odb_source_files`.

There are still a bunch of sites in our code base where we do have to
access internals of the "files" backend. The intent is that those will
go away over time, but this will certainly take a while. Meanwhile,
provide a `odb_source_files_downcast()` function that can convert a
generic source into a "files" source.

As we only have a single source the downcast succeeds unconditionally
for now. Eventually though the intent is to make the cast `BUG()` in
case the caller requests to downcast a non-"files" backend to a "files"
backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:45:15 -08:00
Patrick Steinhardt
cb506a8a69 odb: introduce "files" source
Introduce a new "files" object database source. This source encapsulates
access to both loose object files and the packfile store, similar to how
the "files" backend for refs encapsulates access to loose refs and the
packed-refs file.

Note that for now the "files" source is still a direct member of a
`struct odb_source`. This architecture will be reversed in the next
commit so that the files source contains a `struct odb_source`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:45:14 -08:00
Junio C Hamano
d93be9cbca Merge branch 'ps/fsck-stream-from-the-right-object-instance'
"fsck" iterates over packfiles and its access to pack data caused
the list to be permuted, which caused it to loop forever; the code
to access pack data by "fsck" has been updated to avoid this.

* ps/fsck-stream-from-the-right-object-instance:
  pack-check: fix verification of large objects
  packfile: expose function to read object stream for an offset
  object-file: adapt `stream_object_signature()` to take a stream
  t/helper: improve "genrandom" test helper
2026-03-05 10:04:49 -08:00
Junio C Hamano
10dd04a3fe Merge branch 'ps/object-info-bits-cleanup'
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.

* ps/object-info-bits-cleanup:
  odb: convert `odb_has_object()` flags into an enum
  odb: convert object info flags into an enum
  odb: drop gaps in object info flag values
  builtin/fsck: fix flags passed to `odb_has_object()`
  builtin/backfill: fix flags passed to `odb_has_object()`
2026-03-02 17:06:52 -08:00
Junio C Hamano
b1af291b4a Merge branch 'ps/object-info-bits-cleanup' into ps/odb-sources
* ps/object-info-bits-cleanup:
  odb: convert `odb_has_object()` flags into an enum
  odb: convert object info flags into an enum
  odb: drop gaps in object info flag values
  builtin/fsck: fix flags passed to `odb_has_object()`
  builtin/backfill: fix flags passed to `odb_has_object()`
2026-02-23 13:48:48 -08:00
Patrick Steinhardt
41b42e3527 packfile: expose function to read object stream for an offset
The function `packfile_store_read_object_stream()` takes as input an
object ID and then constructs a `struct odb_read_stream` from it. In a
subsequent commit we'll want to create an object stream for a given
combination of packfile and offset though, which is not something that
can currently be done.

Extract a new function `packfile_read_object_stream()` that makes this
functionality available.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:19:00 -08:00
Patrick Steinhardt
f6516a5241 odb: convert object info flags into an enum
Convert the object info flags into an enum and adapt all functions that
receive these flags as parameters to use the enum instead of an integer.
This serves two purposes:

  - The function signatures become more self-documenting, as callers
    don't have to wonder which flags they expect.

  - The compiler can warn when a wrong flag type is passed.

Note that the second benefit is somewhat limited. For example, when
or-ing multiple enum flags together the result will be an integer, and
the compiler will not warn about such use cases. But where it does help
is when a single flag of the wrong type is passed, as the compiler would
generate a warning in that case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-12 11:05:08 -08:00
Patrick Steinhardt
3565faf28c odb: drop unused for_each_{loose,packed}_object() functions
We have converted all callers of `for_each_loose_object()` and
`for_each_packed_object()` to use their new replacement functions
instead. We can thus remove them now.

Do so and inline `packfile_store_for_each_object_internal()` now that it
only has a single callsite again. This makes it a bit easier to follow
the callback indirection that is happening there.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:08 -08:00
Patrick Steinhardt
736464b84f packfile: introduce function to iterate through objects
Introduce a new function `packfile_store_for_each_object()`. This
function is equivalent to `odb_source_loose_for_each_object()`, except
that it:

  - Works on a single packfile store instead of working on the object
    database level. Consequently, it will only yield packed objects of a
    single object database source.

  - Passes a `struct object_info` to the callback function.

As such, it provides the same callback interface as we already provide
for loose objects now. These functions will be used in a subsequent step
to implement `odb_for_each_object()`.

The `for_each_packed_object()` function continues to exist for now, but
it will be removed at the end of this patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:07 -08:00
Patrick Steinhardt
6358da200f odb: fix flags parameter to be unsigned
The `flags` parameter accepted by various `for_each_object()` functions
is a bitfield of multiple flags. Such parameters are typically unsigned
in the Git codebase, but we use `enum odb_for_each_object_flags` in
some places.

Adapt these function signatures to use the correct type.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:06 -08:00
Patrick Steinhardt
bd1855b897 odb: rename FOR_EACH_OBJECT_* flags
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:06 -08:00
Junio C Hamano
bc5cbbe246 Merge branch 'ps/read-object-info-improvements'
The object-info API has been cleaned up.

* ps/read-object-info-improvements:
  packfile: drop repository parameter from `packed_object_info()`
  packfile: skip unpacking object header for disk size requests
  packfile: disentangle return value of `packed_object_info()`
  packfile: always populate pack-specific info when reading object info
  packfile: extend `is_delta` field to allow for "unknown" state
  packfile: always declare object info to be OI_PACKED
  object-file: always set OI_LOOSE when reading object info
2026-01-21 08:29:00 -08:00
Junio C Hamano
ec16dde5c8 Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-object
* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
  odb: properly close sources before freeing them
  builtin/gc: fix condition for whether to write commit graphs
2026-01-15 05:50:16 -08:00
Patrick Steinhardt
12d3b58b55 packfile: drop repository parameter from packed_object_info()
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.

Drop the redundant parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:51:15 -08:00
Patrick Steinhardt
8908c303da packfile: disentangle return value of packed_object_info()
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.

We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.

This unblocks a small optimization where we can skip reading the object
type altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:51:14 -08:00
Patrick Steinhardt
a282a8f163 packfile: move MIDX into packfile store
The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.

Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:08 -08:00
Patrick Steinhardt
6acefa0d2c packfile: inline find_kept_pack_entry()
The `find_kept_pack_entry()` function is only used in
`has_object_kept_pack()`, which is only a trivial wrapper itself. Inline
the latter into the former.

Furthermore, reorder the code so that we can drop the declaration of the
function in "packfile.h". This allows us to make the function file-local.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:07 -08:00
Patrick Steinhardt
84f0e60b28 packfile: move packfile store into object source
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.

Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.

Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.

Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:07 -08:00
Patrick Steinhardt
085de91b95 packfile: refactor kept-pack cache to work with packfile stores
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.

Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.

Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:06 -08:00
Patrick Steinhardt
480336a9ce packfile: create store via its owning source
In subsequent patches we're about to move the packfile store from the
object database layer into the object database source layer. Once done,
we'll have one packfile store per source, where the source is owning the
store.

Prepare for this future and refactor `packfile_store_new()` to be
initialized via an object database source instead of via the object
database itself.

This refactoring leads to a weird in-between state where the store is
owned by the object database but created via the source. But this makes
subsequent refactorings easier because we can now start to access the
owning source of a given store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09 06:40:06 -08:00
Junio C Hamano
dbe54273a7 Merge branch 'ps/object-read-stream'
The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.

* ps/object-read-stream:
  streaming: drop redundant type and size pointers
  streaming: move into object database subsystem
  streaming: refactor interface to be object-database-centric
  streaming: move logic to read packed objects streams into backend
  streaming: move logic to read loose objects streams into backend
  streaming: make the `odb_read_stream` definition public
  streaming: get rid of `the_repository`
  streaming: rely on object sources to create object stream
  packfile: introduce function to read object info from a store
  streaming: move zlib stream into backends
  streaming: create structure for filtered object streams
  streaming: create structure for packed object streams
  streaming: create structure for loose object streams
  streaming: create structure for in-core object streams
  streaming: allocate stream inside the backend-specific logic
  streaming: explicitly pass packfile info when streaming a packed object
  streaming: propagate final object type via the stream
  streaming: drop the `open()` callback function
  streaming: rename `git_istream` into `odb_read_stream`
2025-12-16 11:08:34 +09:00
Junio C Hamano
f1799202ea Merge branch 'ps/object-read-stream' into ps/packfile-store-in-odb-source
* ps/object-read-stream:
  streaming: drop redundant type and size pointers
  streaming: move into object database subsystem
  streaming: refactor interface to be object-database-centric
  streaming: move logic to read packed objects streams into backend
  streaming: move logic to read loose objects streams into backend
  streaming: make the `odb_read_stream` definition public
  streaming: get rid of `the_repository`
  streaming: rely on object sources to create object stream
  packfile: introduce function to read object info from a store
  streaming: move zlib stream into backends
  streaming: create structure for filtered object streams
  streaming: create structure for packed object streams
  streaming: create structure for loose object streams
  streaming: create structure for in-core object streams
  streaming: allocate stream inside the backend-specific logic
  streaming: explicitly pass packfile info when streaming a packed object
  streaming: propagate final object type via the stream
  streaming: drop the `open()` callback function
  streaming: rename `git_istream` into `odb_read_stream`
2025-12-15 17:40:31 +09:00
Junio C Hamano
9d442ce2e2 Merge branch 'ps/object-source-management'
Code refactoring around object database sources.

* ps/object-source-management:
  odb: handle recreation of quarantine directories
  odb: handle changing a repository's commondir
  chdir-notify: add function to unregister listeners
  odb: handle initialization of sources in `odb_new()`
  http-push: stop setting up `the_repository` for each reference
  t/helper: stop setting up `the_repository` repeatedly
  builtin/index-pack: fix deferred fsck outside repos
  oidset: introduce `oidset_equal()`
  odb: move logic to disable ref updates into repo
  odb: refactor `odb_clear()` to `odb_free()`
  odb: adopt logic to close object databases
  setup: convert `set_git_dir()` to have file scope
  path: move `enter_repo()` into "setup.c"
2025-12-05 14:49:58 +09:00