This is hidden in v2.55.0-rc0's own CI because of an omission in
5ba82911bc (ci: enable EXPENSIVE for contributor builds, 2026-05-11)
which fails to enable EXPENSIVE tests for tags.
Due to 7d78d5fc1a (ci: skip GitHub workflow runs for already-tested
commits/trees, 2020-10-08), the CI of `master` is now also mistakenly
green because it reuses the tag's CI run to prove that it's solid.
This is an evil merge by necessity because `survey.c` needs to adapt to
the changed function signatures.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Over the last commits we have turned the packfile store into a proper
object database source that can be used as a standalone backend. As
such, it is no longer necessary to have it coupled to the "files" parent
source.
Remove the pointer to the owning "files" source so that the "packed"
source can be used as a standalone entity.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `packfile_store_freshen_object()` and from "packfile.c" into
"odb/source-packed.c" and wire it up as the `freshen_object()` callback
of the "packed" source.
Note that this removes the last external caller of `find_pack_entry()`
from "packfile.c", which means that we can now make this function
static.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `packfile_store_find_abbrev_len()` and its associated helpers from
"packfile.c" into "odb/source-packed.c" and wire it up as the
`find_abbrev_len()` callback of the "packed" source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `packfile_store_count_objects()` from "packfile.c" into
"odb/source-packed.c" and wire it up as the `count_objects()` callback
of the "packed" source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `packfile_store_for_each_object()` and its associated helpers from
"packfile.c" into "odb/source-packed.c" and wire it up as the
`for_each_object()` callback of the "packed" source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up the `read_object_stream()` callback for the packed source and
call it in the "files" source via the `odb_source_read_object_stream()`
interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the logic to read object info from a "packed" source into
"odb/source-packed.c" and wire it up as the `read_object_info()`
callback.
Note that we also move around the supporting `find_pack_entry()`, but we
still have to expose it to other callers that exist in "packfile.c".
This will be fixed in subsequent commits though, where all callers in
"packfile.c" will have been moved into "odb/source-packed.c", and at
that point we'll be able to make `find_pack_entry()` file-local again.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `has_object_pack()` we're checking whether a specific object exists
as part of a packfile. This is done by calling the low-level function
`find_pack_entry()`, but this function will eventually be moved into
"odb/source-packed.c" and made file-local.
Refactor the code to use `packfile_store_read_object_info()` instead.
This refactoring is functionally equivalent as that function will call
`find_pack_entry()` itself and then return immediately when it ain't got
no object info pointer as parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the logic to prepare and reprepare the "packed" source into
"odb/source-packed.c" and wire it up as the `reprepare()` callback.
Note that "preparing" a source is not yet generic. Eventually, it would
probably make sense to turn the existing `reprepare()` callback into a
`prepare()` callback with an optional flag to force re-preparing. But
this step will be handled in a separate patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up a new `close()` callback for the packed source and call it from
the "files" source via the generic `odb_source_close()` interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start converting `struct odb_source_packed` into a proper pluggable
`struct odb_source` by embedding the base struct and assigning it the
new `ODB_SOURCE_PACKED` type. Furthermore, wire up lifecycle management
of this source by implementing the `free` callback and taking ownership
of the chdir notifications.
Note that the packed source is not yet functional as a standalone `struct
odb_source`, as it's missing all of the callback implementations. These
will be wired up in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct odb_source_packed` holds a pointer to its owning parent
source. The way that Git is currently structured, this parent is always
the "files" source. In subsequent commits we're going to detangle that
so that the "packed" source doesn't have any owning parent source at
all, which makes it usable as a completely standalone source.
Detangling this mess is somewhat intricate though, and is made even more
intricate because it's not always clear which kind of source one is
holding at a specific point in time -- either the parent "files" source,
or the child "packed" source.
Make this relationship more explicit by storing a pointer to the "files"
source instead of storing a pointer to a generic `struct odb_source`.
This will help make subsequent steps a bit clearer.
Note that this is a temporary step, only. At the end of this series
we will have dropped the parent pointer completely.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent patches we'll be turning `struct odb_source_packed` into a
proper `struct odb_source`. As a first step towards this goal, move its
struct out of "packfile.{c,h}" and into "odb/source-packed.{c,h}".
This detaches the implementation of the packfile object source from the
generic packfile code, following the same convention already used by the
"files" and "in-memory" sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we're about to introduce the "packed" object database
source. This source will embed a packfile list, and consequently we'll
have to include "packfile.h" to make the struct definition available.
This will unfortunately lead to a cyclic dependency that we cannot
resolve with a forward declaration.
Split out the code that relates to the packfile list into a separate
compilation unit so that both "packfile.h" and "odb/source-packed.h" can
include it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Not too long ago, we have introduced the packfile store in b7983adb51
(packfile: introduce a new `struct packfile_store`, 2025-09-23). This
struct is responsible for managing all of our access to packfiles and is
used as one of the two sources of objects for the "files" source.
Back when I introduced this structure I didn't have the clear vision yet
that it will eventually also turn into a proper object database source,
and how exactly that infrastructure will look like. Now though it's
becoming increasingly clear that it does make sense to treat it just the
same as any of our other ODB sources.
The consequence is that the naming is now a bit out-of-date: it's just
another source and will be turned into a proper `struct odb_source` over
the next couple of commits, but it's not named accordingly.
Rename the structure to `odb_source_packed` to align it with this goal
and to bring it in line with the other sources we already have.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `js/objects-larger-than-4gb-on-windows` widened the streaming,
index-pack and unpack-objects code paths, in the interest of keeping the
patches somewhat reasonably-sized, it left the public ODB API still
typed in `unsigned long`. In particular `struct object_info::sizep` and
the four wrappers built on top of it (`odb_read_object`,
`odb_read_object_peeled`, `odb_read_object_info`, `odb_pretend_object`)
still return the unpacked size through `unsigned long *`, so on Windows
`cat-file -s` and the `git add` / `git status` paths for a >4 GiB blob
silently cap at 4 GiB.
Widen the field and the four wrappers. The previous commits already
widened the `unpack_entry()` cascade and pack-objects' in-core size
accessors, so most of the cascade arrives here with no further work: the
temporary shims in `packed_object_info_with_index_pos()` and in
`unpack_entry()`'s delta-base recovery path go away, the two
`SET_SIZE(entry, cast_size_t_to_ulong(canonical_size))` calls in
`check_object()` and the matching one in `drop_reused_delta()` collapse
to plain `SET_SIZE`, and `oe_get_size_slow()`'s tail
`cast_size_t_to_ulong()` is gone too.
What remains narrow are the boundaries this series does not
intend to touch: the diff, blame, textconv and fast-import machinery.
Even so, this patch is unfortunately quite large.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When I started the transition from `unsigned long` to `size_t`, in the
interest of keeping the patches reviewable, I introduced these calls to
prevent data type narrowing from silently failing to handle large object
sizes. I also introduced `*_sz()` variants that would allow most of the
callers to keep using that `unsigned long` that the 90s kindly asked to
be returned.
After the preceding commits, the only places that called the narrow
wrappers either no longer exist or already use the `_sz` form
internally, so the wrappers just narrow values back through
`cast_size_t_to_ulong()` for no reason.
Drop them and rename the `_sz` variants back to the natural names.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The topic `js/objects-larger-than-4gb-on-windows` widened the streaming,
index-pack and unpack-objects paths to `size_t` but deliberately stopped
at the in-memory `unpack_entry()` cascade, which still hands back the
unpacked size through `unsigned long *`. On Windows that boundary
truncates above 4 GiB because that data type is only 32 bits wide on
that platform.
Widen the code path. Except `packed_object_info_with_index_pos()`: It
cannot yet pass `oi->sizep` directly because the field is still
`unsigned long *`; bridge it with a `size_t` temporary that narrows
back, and let a later commit drop the bridge once the field is wide
too. `gfi_unpack_entry()` keeps its narrow signature because fast-import
tracks sizes through `unsigned long` everywhere it crosses subsystem
boundaries, keeping its signature allows the scope of this commit to be
somewhat reasonable, still.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`patch_delta()` takes the source and delta sizes by value and writes
back the reconstructed target size through an `unsigned long *`. That
datatype cannot represent a value that exceeds 4 GiB on systems where
`unsigned long` is 32-bit (notably 64-bit Windows builds), though, even
though the delta encoding itself, the on-disk layout, and the in-memory
buffers happily carry such sizes. A `size_t` companion to
`get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in
17fa077596 (delta, packfile: use size_t for delta header sizes,
2026-05-08) precisely so that `patch_delta()` could be widened without
changing the on-the-wire decoding helper's signature.
Widen `patch_delta()`'s three size parameters to `size_t` and switch
its internal use of `get_delta_hdr_size()` to the `_sz` variant.
Then propagate the wider type through the callers.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The repacking code has been refactored and compaction of MIDX layers
have been implemented, and incremental strategy that does not require
all-into-one repacking has been introduced.
* tb/incremental-midx-part-3.3:
repack: allow `--write-midx=incremental` without `--geometric`
repack: introduce `--write-midx=incremental`
repack: implement incremental MIDX repacking
packfile: ensure `close_pack_revindex()` frees in-memory revindex
builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
repack-geometry: prepare for incremental MIDX repacking
repack-midx: extract `repack_fill_midx_stdin_packs()`
repack-midx: factor out `repack_prepare_midx_command()`
midx: expose `midx_layer_contains_pack()`
repack: track the ODB source via existing_packs
midx: support custom `--base` for incremental MIDX writes
midx: introduce `--no-write-chain-file` for incremental MIDX writes
midx: use `strvec` for `keep_hashes`
midx: build `keep_hashes` array in order
midx: use `strset` for retained MIDX files
midx-write: handle noop writes when converting incremental chains
The following commit will introduce a case where we write a MIDX bitmap
over packs that do not themselves have on-disk *.rev files.
This case is supported within Git, and we will simply fall back to
generating the revindex in memory. But we don't ever release that
memory, causing a leak that is exposed by a test introduced in the
following commit.
(As far as I could find, we never free()'d memory allocated as a
byproduct of creating an in-memory revindex, likely because that code
predates the leak-checking niceties we have in the test suite now.)
Rectify this by calling `FREE_AND_NULL()` on the `p->revindex` field
when calling `close_pack_revindex()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The delta header decoding functions return unsigned long, which
truncates on Windows for objects larger than 4GB. Introduce size_t
variants get_delta_hdr_size_sz() and get_size_from_delta_sz() that
preserve the full 64-bit size, and use them in packed_object_info()
where the size is needed for streaming decisions.
This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The odb_read_stream structure uses unsigned long for the size field,
which is 32-bit on Windows even in 64-bit builds. When streaming
objects larger than 4GB, the size would be truncated to zero or an
incorrect value, resulting in empty files being written to disk.
Change the size field in odb_read_stream to size_t and introduce
unpack_object_header_sz() to return sizes via size_t pointer. Since
object_info.sizep remains unsigned long for API compatibility, use
temporary variables where the types differ, with comments noting the
truncation limitation for code paths that still use unsigned long.
Widening the producers to size_t in this way introduces a handful of
silent size_t -> unsigned long narrowings on Windows, all in
builtin/pack-objects.c, where the consumers are still typed
unsigned long. Make those narrowings explicit with
cast_size_t_to_ulong() so they assert loudly the moment an object
actually exceeds ULONG_MAX bytes:
- oe_get_size_slow() returns unsigned long but holds a size_t
locally; cast at the return.
- write_reuse_object() passes a size_t into check_pack_inflate(),
whose expect parameter is unsigned long; cast at the call.
- check_object() routes a size_t through SET_SIZE() and
SET_DELTA_SIZE(), both of which take unsigned long via
oe_set_size() / oe_set_delta_size(); cast at the three call
sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the
non-delta default arm.
The cast-only treatment is deliberately a stop-gap. Properly
widening oe_set_size, oe_get_size_slow's return type,
check_pack_inflate's expect parameter, object_info.sizep,
patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series
that is too large to be reviewable, so the proper widening is
deferred to a follow-up topic. Until then,
cast_size_t_to_ulong() at least makes the truncation explicit at
the source: it documents the boundary, and on a 64-bit non-Windows
platform it is a no-op.
This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various code clean-up around odb subsystem.
* ps/odb-cleanup:
odb: drop unneeded headers and forward decls
odb: rename `odb_has_object()` flags
odb: use enum for `odb_write_object` flags
odb: rename `odb_write_object()` flags
treewide: use enum for `odb_for_each_object()` flags
CodingGuidelines: document our style for flags
pack-objects's --stdin-packs=follow mode learns to handle
excluded-but-open packs.
* tb/stdin-packs-excluded-but-open:
repack: mark non-MIDX packs above the split as excluded-open
pack-objects: support excluded-open packs with --stdin-packs
t7704: demonstrate failure with once-cruft objects above the geometric split
pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
pack-objects: plug leak in `read_stdin_packs()`
We've got a couple of callsites where we pass `odb_for_each_object()`
flags, but accept an `unsigned` flags field instead of the corresponding
enum. Adapt these to accept the enum type instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cd846bacc7 (pack-objects: introduce '--stdin-packs=follow',
2025-06-23), pack-objects learned to traverse through commits in
included packs when using '--stdin-packs=follow', rescuing reachable
objects from unlisted packs into the output.
When we encounter a commit in an excluded pack during this rescuing
phase we will traverse through its parents. But because we set
`revs.no_kept_objects = 1`, commit simplification will prevent us from
showing it via `get_revision()`. (In practice, `--stdin-packs=follow`
walks commits down to the roots, but only opens up trees for ones that
do not appear in an excluded pack.)
But there are certain cases where we *do* need to see the parents of an
object in an excluded pack. Namely, if an object is rescue-able, but
only reachable from object(s) which appear in excluded packs, then
commit simplification will exclude those commits from the object
traversal, and we will never see a copy of that object, and thus not
rescue it.
This is what causes the failure in the previous commit during repacking.
When performing a geometric repack, packs above the geometric split that
weren't part of the previous MIDX (e.g., packs pushed directly into
`$GIT_DIR/objects/pack`) may not have full object closure. When those
packs are listed as excluded via the '^' marker, the reachability
traversal encounters the sequence described above, and may miss objects
which we expect to rescue with `--stdin-packs=follow`.
Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed
packs, objects from '!'-prefixed packs are excluded from the resulting
pack. But unlike '^', commits in '!'-prefixed packs *are* used as
starting points for the follow traversal, and the traversal does not
treat them as a closure boundary.
In order to distinguish excluded-closed from excluded-open packs during
the traversal, introduce a new `pack_keep_in_core_open` bit on
`struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN`
flag for the kept-pack cache.
In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
check to *after* `add_pending_oid()`. This is necessary so that commits
from excluded-open packs are added as traversal tips even though their
objects won't appear in the output. As a consequence, the caller
`for_each_object_in_pack()` will always provide a non-NULL 'p', hence we
are able to drop the "if (p)" conditional.
The `include_check` and `include_check_obj` callbacks on `rev_info` are
used to halt the walk at closed-excluded packs, since objects behind a
'^' boundary are guaranteed to have closure and need not be rescued.
The following commit will make use of this new functionality within the
repack layer to resolve the test failure demonstrated in the previous
commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as the preceding commit, move the logic that computes the minimum
required prefix length to make a given object ID unique for the packfile
store into a new function `packfile_store_find_abbrev_len()` that is
part of "packfile.c". This prepares for making the logic fully generic
via pluggable object databases.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, move the logic to iterate through
objects that have a given prefix into "packfile.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `odb_for_each_object()` function only accepts a bitset of flags. In
a subsequent commit we'll want to change object iteration to also
support iterating over only those objects that have a specific prefix.
While we could of course add the prefix to the function signature, or
alternatively introduce a new function, both of these options don't
really seem to be that sensible.
Instead, introduce a new `struct odb_for_each_object_options` that can
be passed to a new `odb_for_each_object_ext()` function. Splice through
the options structure into the respective object database sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object source API is getting restructured to allow plugging new
backends.
* ps/odb-sources:
odb/source: make `begin_transaction()` function pluggable
odb/source: make `write_alternate()` function pluggable
odb/source: make `read_alternates()` function pluggable
odb/source: make `write_object_stream()` function pluggable
odb/source: make `write_object()` function pluggable
odb/source: make `freshen_object()` function pluggable
odb/source: make `for_each_object()` function pluggable
odb/source: make `read_object_stream()` function pluggable
odb/source: make `read_object_info()` function pluggable
odb/source: make `close()` function pluggable
odb/source: make `reprepare()` function pluggable
odb/source: make `free()` function pluggable
odb/source: introduce source type for robustness
odb: move reparenting logic into respective subsystems
odb: embed base source in the "files" backend
odb: introduce "files" source
odb: split `struct odb_source` into separate header
Similar to the preceding commit, introduce counting of objects on the
object database level, replacing the logic that we have in
`repo_approximate_object_count()`.
Note that the function knows to cache the object count. It's unclear
whether this cache is really required as we shouldn't have that many
cases where we count objects repeatedly. But to be on the safe side the
caching mechanism is retained, with the only excepting being that we
also have to use the passed flags as caching key.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce generic object counting on the object database source level
with a new backend-specific callback function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we're about to introduce a new
`odb_source_count_objects()` function so that we can make the logic
pluggable. Prepare for this change by extracting the logic that we have
to count packed objects into a standalone function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/odb-sources:
odb/source: make `begin_transaction()` function pluggable
odb/source: make `write_alternate()` function pluggable
odb/source: make `read_alternates()` function pluggable
odb/source: make `write_object_stream()` function pluggable
odb/source: make `write_object()` function pluggable
odb/source: make `freshen_object()` function pluggable
odb/source: make `for_each_object()` function pluggable
odb/source: make `read_object_stream()` function pluggable
odb/source: make `read_object_info()` function pluggable
odb/source: make `close()` function pluggable
odb/source: make `reprepare()` function pluggable
odb/source: make `free()` function pluggable
odb/source: introduce source type for robustness
odb: move reparenting logic into respective subsystems
odb: embed base source in the "files" backend
odb: introduce "files" source
odb: split `struct odb_source` into separate header
Introduce a new callback function in `struct odb_source` to make the
function pluggable.
Note that this function is a bit less straight-forward to convert
compared to the other functions. The reason here is that the logic to
read an object is:
1. We try to read the object. If it exists we return it.
2. If the object does not exist we reprepare the object database
source.
3. We then try reading the object info a second time in case the
reprepare caused it to appear.
The second read is only supposed to happen for the packfile store
though, as reading loose objects is not impacted by repreparing the
object database.
Ideally, we'd just move this whole logic into the ODB source. But that's
not easily possible because we try to avoid the reprepare unless really
required, which is after we have found out that no other ODB source
contains the object, either. So the logic spans across multiple ODB
sources, and consequently we cannot move it into an individual source.
Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the
backend that we already tried to look up the object once, and that this
time around the ODB source should try to find any new objects that may
have surfaced due to an on-disk change.
With this flag, the "files" backend can trivially skip trying to re-read
the object as a loose object. Furthermore, as we know that we only try
the second read via the packfile store, we can skip repreparing loose
objects and only reprepare the packfile store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "files" backend is implemented as a pointer in the `struct
odb_source`. This contradicts our typical pattern for pluggable backends
like we use it for example in the ref store or for object database
streams, where we typically embed the generic base structure in the
specialized implementation. This pattern has a couple of small benefits:
- We avoid an extra allocation.
- We hide implementation details in the generic structure.
- We can easily downcast from a generic backend to the specialized
structure and vice versa because the offsets are known at compile
time.
- It becomes trivial to identify locations where we depend on backend
specific logic because the cast needs to be explicit.
Refactor our "files" object database source to do the same and embed the
`struct odb_source` in the `struct odb_source_files`.
There are still a bunch of sites in our code base where we do have to
access internals of the "files" backend. The intent is that those will
go away over time, but this will certainly take a while. Meanwhile,
provide a `odb_source_files_downcast()` function that can convert a
generic source into a "files" source.
As we only have a single source the downcast succeeds unconditionally
for now. Eventually though the intent is to make the cast `BUG()` in
case the caller requests to downcast a non-"files" backend to a "files"
backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "files" object database source. This source encapsulates
access to both loose object files and the packfile store, similar to how
the "files" backend for refs encapsulates access to loose refs and the
packed-refs file.
Note that for now the "files" source is still a direct member of a
`struct odb_source`. This architecture will be reversed in the next
commit so that the files source contains a `struct odb_source`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"fsck" iterates over packfiles and its access to pack data caused
the list to be permuted, which caused it to loop forever; the code
to access pack data by "fsck" has been updated to avoid this.
* ps/fsck-stream-from-the-right-object-instance:
pack-check: fix verification of large objects
packfile: expose function to read object stream for an offset
object-file: adapt `stream_object_signature()` to take a stream
t/helper: improve "genrandom" test helper
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.
* ps/object-info-bits-cleanup:
odb: convert `odb_has_object()` flags into an enum
odb: convert object info flags into an enum
odb: drop gaps in object info flag values
builtin/fsck: fix flags passed to `odb_has_object()`
builtin/backfill: fix flags passed to `odb_has_object()`
* ps/object-info-bits-cleanup:
odb: convert `odb_has_object()` flags into an enum
odb: convert object info flags into an enum
odb: drop gaps in object info flag values
builtin/fsck: fix flags passed to `odb_has_object()`
builtin/backfill: fix flags passed to `odb_has_object()`
The function `packfile_store_read_object_stream()` takes as input an
object ID and then constructs a `struct odb_read_stream` from it. In a
subsequent commit we'll want to create an object stream for a given
combination of packfile and offset though, which is not something that
can currently be done.
Extract a new function `packfile_read_object_stream()` that makes this
functionality available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the object info flags into an enum and adapt all functions that
receive these flags as parameters to use the enum instead of an integer.
This serves two purposes:
- The function signatures become more self-documenting, as callers
don't have to wonder which flags they expect.
- The compiler can warn when a wrong flag type is passed.
Note that the second benefit is somewhat limited. For example, when
or-ing multiple enum flags together the result will be an integer, and
the compiler will not warn about such use cases. But where it does help
is when a single flag of the wrong type is passed, as the compiler would
generate a warning in that case.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have converted all callers of `for_each_loose_object()` and
`for_each_packed_object()` to use their new replacement functions
instead. We can thus remove them now.
Do so and inline `packfile_store_for_each_object_internal()` now that it
only has a single callsite again. This makes it a bit easier to follow
the callback indirection that is happening there.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some use cases where we need to figure out the mtime for
objects. Most importantly, this is the case when we want to prune
unreachable objects. But getting at that data requires users to manually
derive the info either via the loose object's mtime, the packfiles'
mtime or via the ".mtimes" file.
Introduce a new `struct object_info::mtimep` pointer that allows callers
to request an object's mtime. This new field will be used in a
subsequent commit.
Note that the concept of "mtime" is ambiguous: given an object, it may
be stored multiple times in the object database, and each of these
instances may have a different mtime. Disambiguating these mtimes is
nothing that can happen on the generic ODB layer: the caller may search
for the oldest object, the newest object, or even the relation of object
mtimes depending on the specific source they are located in. As such, it
is the responsibility of the caller to disambiguate mtimes.
A consequence of this is that it's most likely incorrect to look up the
mtime via `odb_read_object_info()`, as this interface does not give us
enough information to disambiguate the mtime. Document this accordingly
and tell users to use `odb_for_each_object()` instead.
Even with this gotcha though it's sensible to have this request as part
of the object info, as the mtime is a property of the object storage
format. If we for example had a "black-box" storage backend, we'd still
need to be able to query it for the mtime info in a generic way.
We could introduce a safety mechanism that for example calls `BUG()` in
case we look up the mtime outside of `odb_for_each_object()`. But that
feels somewhat heavy-handed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple callsites where we enumerate all promisor objects in
the object database via `for_each_packed_object()`. This is done by
passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to
skip over all non-promisor objects.
These callsites can be trivially converted to `odb_for_each_object()` as
we know to skip enumeration of loose objects in case the `PROMISOR_ONLY`
flag was passed by the caller.
Refactor the sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `packfile_store_for_each_object()`. This
function is equivalent to `odb_source_loose_for_each_object()`, except
that it:
- Works on a single packfile store instead of working on the object
database level. Consequently, it will only yield packed objects of a
single object database source.
- Passes a `struct object_info` to the callback function.
As such, it provides the same callback interface as we already provide
for loose objects now. These functions will be used in a subsequent step
to implement `odb_for_each_object()`.
The `for_each_packed_object()` function continues to exist for now, but
it will be removed at the end of this patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we're about to introduce a new function that knows to
iterate through objects of a given packfile store. Same as with the
equivalent function for loose objects, this new function will also be
agnostic of backends by using a `struct object_info`.
Prepare for this by extracting a new shared function to iterate through
a single packfile store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `flags` parameter accepted by various `for_each_object()` functions
is a bitfield of multiple flags. Such parameters are typically unsigned
in the Git codebase, but we use `enum odb_for_each_object_flags` in
some places.
Adapt these function signatures to use the correct type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>