Commit Graph

309 Commits

Author SHA1 Message Date
Junio C Hamano
fe4ab2e698 Merge branch 'jt/index-fd-wo-repo-regression-fix-maint'
During Git 2.52 timeframe, we broke streaming computation of object
hash outside a repository, which has been corrected.

* jt/index-fd-wo-repo-regression-fix-maint:
  object-file: avoid ODB transaction when not writing objects
2026-04-08 10:20:51 -07:00
Junio C Hamano
9797fed6ce Merge branch 'ps/odb-cleanup'
Various code clean-up around odb subsystem.

* ps/odb-cleanup:
  odb: drop unneeded headers and forward decls
  odb: rename `odb_has_object()` flags
  odb: use enum for `odb_write_object` flags
  odb: rename `odb_write_object()` flags
  treewide: use enum for `odb_for_each_object()` flags
  CodingGuidelines: document our style for flags
2026-04-08 10:19:17 -07:00
Justin Tobler
7d8727ff0b object-file: avoid ODB transaction when not writing objects
In ce1661f9da (odb: add transaction interface, 2025-09-16), existing
ODB transaction logic is adapted to create a transaction interface
at the ODB layer. The intent here is for the ODB transaction
interface to eventually provide an object source agnostic means to
manage transactions.

An unintended consequence of this change though is that
`object-file.c:index_fd()` may enter the ODB transaction path even
when no object write is requested. In non-repository contexts, this
can result in a NULL dereference and segfault. One such case occurs
when running git-diff(1) outside of a repository with
"core.bigFileThreshold" forcing the streaming path in `index_fd()`:

        $ echo foo >foo
        $ echo bar >bar
        $ git -c core.bigFileThreshold=1 diff -- foo bar

In this scenario, the caller only needs to compute the object ID. Object
hashing does not require an ODB, so starting a transaction is both
unnecessary and invalid.

Fix the bug by avoiding the use of ODB transactions in `index_fd()` when
callers are only interested in computing the object hash.

Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
[jc: adjusted to fd13909e (Merge branch 'jt/odb-transaction', 2025-10-02)]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-04-07 17:32:36 -07:00
Junio C Hamano
7b6d0cd51b Merge branch 'ps/fsck-wo-the-repository'
Internals of "git fsck" have been refactored to not depend on the
global `the_repository` variable.

* ps/fsck-wo-the-repository:
  builtin/fsck: stop using `the_repository` in error reporting
  builtin/fsck: stop using `the_repository` when marking objects
  builtin/fsck: stop using `the_repository` when checking packed objects
  builtin/fsck: stop using `the_repository` with loose objects
  builtin/fsck: stop using `the_repository` when checking reflogs
  builtin/fsck: stop using `the_repository` when checking refs
  builtin/fsck: stop using `the_repository` when snapshotting refs
  builtin/fsck: fix trivial dependence on `the_repository`
  fsck: drop USE_THE_REPOSITORY
  fsck: store repository in fsck options
  fsck: initialize fsck options via a function
  fetch-pack: move fsck options into function scope
2026-04-07 14:59:26 -07:00
Patrick Steinhardt
c63911b052 odb: rename odb_has_object() flags
Rename `odb_has_object()` flags to be properly prefixed with the
function name.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-31 20:43:14 -07:00
Patrick Steinhardt
b2d421ece6 odb: use enum for odb_write_object flags
We've got a couple of functions that accept `odb_write_object()` flags,
but all of them accept the flags as an `unsigned` integer. In fact, we
don't even have an `enum` for the flags field.

Introduce this `enum` and adapt functions accordingly according to our
coding style.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-31 20:43:13 -07:00
Patrick Steinhardt
ff2e9d85d6 odb: rename odb_write_object() flags
Rename `odb_write_object()` flags to be properly prefixed with the
function name.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-31 20:43:13 -07:00
Patrick Steinhardt
3749853908 fsck: store repository in fsck options
The fsck subsystem relies on `the_repository` quite a bit. While we
could of course explicitly pass a repository down the callchain, we
already have a `struct fsck_options` that we pass to almost all
functions.

Extend the options to also store the repository to make it readily
available.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23 08:33:10 -07:00
Patrick Steinhardt
f223609026 fsck: initialize fsck options via a function
We initialize the `struct fsck_options` via a set of macros, often in
global scope. In the next commit though we're about to introduce a new
repository field to the options that must be initialized, and naturally
we don't have a repo other than `the_repository` available in this
scope.

Refactor the code to instead intrdouce a new `fsck_options_init()`
function that initializes the options for us and move initialization
into function scope.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23 08:33:10 -07:00
Patrick Steinhardt
ab3ab1038d object-name: move logic to compute loose abbreviation length
The function `repo_find_unique_abbrev_r()` takes as input an object ID
as well as a minimum object ID length and returns the minimum required
prefix to make the object ID unique.

The logic that computes the abbreviation length for loose objects is
deeply tied to the loose object storage format. As such, it would fail
in case a different object storage format was used.

Prepare for making this logic generic to the backend by moving the logic
into a new `odb_source_loose_find_abbrev_len()` function that is part of
"object-file.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20 13:16:42 -07:00
Patrick Steinhardt
284b7862be object-name: move logic to iterate through loose prefixed objects
The logic to iterate through loose objects that have a certain prefix is
currently hosted in "object-name.c". This logic reaches into specifics
of the loose object source, so it breaks once a different backend is
used for the object storage.

Move the logic to iterate through loose objects with a prefix into
"object-file.c". This is done by extending the for-each-object options
to support an optional prefix that is then honored by the loose source.
Naturally, we'll also have this support in the packfile store. This is
done in the next commit.

Furthermore, there are no users of the loose cache outside of
"object-file.c" anymore. As such, convert `odb_source_loose_cache()` to
have file scope.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20 13:16:42 -07:00
Patrick Steinhardt
cfd575f0a9 odb: introduce struct odb_for_each_object_options
The `odb_for_each_object()` function only accepts a bitset of flags. In
a subsequent commit we'll want to change object iteration to also
support iterating over only those objects that have a specific prefix.
While we could of course add the prefix to the function signature, or
alternatively introduce a new function, both of these options don't
really seem to be that sensible.

Instead, introduce a new `struct odb_for_each_object_options` that can
be passed to a new `odb_for_each_object_ext()` function. Splice through
the options structure into the respective object database sources.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20 13:16:41 -07:00
Junio C Hamano
7f75767554 Merge branch 'ps/object-counting' into ps/odb-generic-object-name-handling
* ps/object-counting:
  object-file: fix sparse 'plain integer as NULL pointer' error
  odb: introduce generic object counting
  odb/source: introduce generic object counting
  object-file: generalize counting objects
  object-file: extract logic to approximate object count
  packfile: extract logic to count number of objects
  odb: stop including "odb/source.h"
2026-03-20 13:16:09 -07:00
Ramsay Jones
736cef847c object-file: fix sparse 'plain integer as NULL pointer' error
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 18:35:49 -07:00
Junio C Hamano
2eec0f5115 Merge branch 'jk/unleak-mmap'
Plug a few leaks where mmap'ed memory regions are not unmapped.

* jk/unleak-mmap:
  meson: turn on NO_MMAP when building with LSan
  Makefile: turn on NO_MMAP when building with LSan
  object-file: fix mmap() leak in odb_source_loose_read_object_stream()
  pack-revindex: avoid double-loading .rev files
  check_connected(): fix leak of pack-index mmap
  check_connected(): delay opening new_pack
2026-03-16 10:48:15 -07:00
Junio C Hamano
c89a495ce4 Merge branch 'ps/odb-sources'
The object source API is getting restructured to allow plugging new
backends.

* ps/odb-sources:
  odb/source: make `begin_transaction()` function pluggable
  odb/source: make `write_alternate()` function pluggable
  odb/source: make `read_alternates()` function pluggable
  odb/source: make `write_object_stream()` function pluggable
  odb/source: make `write_object()` function pluggable
  odb/source: make `freshen_object()` function pluggable
  odb/source: make `for_each_object()` function pluggable
  odb/source: make `read_object_stream()` function pluggable
  odb/source: make `read_object_info()` function pluggable
  odb/source: make `close()` function pluggable
  odb/source: make `reprepare()` function pluggable
  odb/source: make `free()` function pluggable
  odb/source: introduce source type for robustness
  odb: move reparenting logic into respective subsystems
  odb: embed base source in the "files" backend
  odb: introduce "files" source
  odb: split `struct odb_source` into separate header
2026-03-12 14:09:07 -07:00
Patrick Steinhardt
2b24db1110 object-file: generalize counting objects
Generalize the function introduced in the preceding commit to not only
be able to approximate the number of loose objects, but to also provide
an accurate count. The behaviour can be toggled via a new flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-12 08:38:42 -07:00
Patrick Steinhardt
222fddeaa4 object-file: extract logic to approximate object count
In "builtin/gc.c" we have some logic that checks whether we need to
repack objects. This is done by counting the number of objects that we
have and checking whether it exceeds a certain threshold. We don't
really need an accurate object count though, which is why we only
open a single object directory shard and then extrapolate from there.

Extract this logic into a new function that is owned by the loose object
database source. This is done to prepare for a subsequent change, where
we'll introduce object counting on the object database source level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-12 08:38:42 -07:00
Junio C Hamano
6cdef943d2 Merge branch 'ps/odb-sources' into ps/object-counting
* ps/odb-sources:
  odb/source: make `begin_transaction()` function pluggable
  odb/source: make `write_alternate()` function pluggable
  odb/source: make `read_alternates()` function pluggable
  odb/source: make `write_object_stream()` function pluggable
  odb/source: make `write_object()` function pluggable
  odb/source: make `freshen_object()` function pluggable
  odb/source: make `for_each_object()` function pluggable
  odb/source: make `read_object_stream()` function pluggable
  odb/source: make `read_object_info()` function pluggable
  odb/source: make `close()` function pluggable
  odb/source: make `reprepare()` function pluggable
  odb/source: make `free()` function pluggable
  odb/source: introduce source type for robustness
  odb: move reparenting logic into respective subsystems
  odb: embed base source in the "files" backend
  odb: introduce "files" source
  odb: split `struct odb_source` into separate header
2026-03-10 10:13:40 -07:00
Jeff King
b68e875bec object-file: fix mmap() leak in odb_source_loose_read_object_stream()
We mmap() a loose object file, storing the result in the local variable
"mapped", which is eventually assigned into our stream struct as
"st.mapped". If we hit an error, we jump to an error label which does:

  munmap(st.mapped, st.mapsize);

to clean up. But this is wrong; we don't assign st.mapped until the end
of the function, after all of the "goto error" jumps. So this munmap()
is never cleaning up anything (st.mapped is always NULL, because we
initialize the struct with calloc).

Instead, we should feed the local variable to munmap().

This leak is due to 595296e124 (streaming: allocate stream inside the
backend-specific logic, 2025-11-23), which introduced the local
variable. Before that, we assigned the mmap result directly into
st.mapped. It was probably switched there so that we do not have to
allocate/free the struct when the map operation fails (e.g., because we
don't have the loose object). Before that commit, the struct was passed
in from the caller, so there was no allocation at all.

You can see the leak in the test suite by building with:

  make SANITIZE=leak NO_MMAP=1 CC=clang

and running t1060. We need NO_MMAP so that the mmap() is backed by an
actual malloc(), which allows LSan to detect it. And the leak seems not
to be detected when compiling with gcc, probably due to some internal
compiler decisions about how the stack memory is written.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-06 21:11:32 -08:00
Patrick Steinhardt
5946a564cd odb/source: make read_object_info() function pluggable
Introduce a new callback function in `struct odb_source` to make the
function pluggable.

Note that this function is a bit less straight-forward to convert
compared to the other functions. The reason here is that the logic to
read an object is:

  1. We try to read the object. If it exists we return it.

  2. If the object does not exist we reprepare the object database
     source.

  3. We then try reading the object info a second time in case the
     reprepare caused it to appear.

The second read is only supposed to happen for the packfile store
though, as reading loose objects is not impacted by repreparing the
object database.

Ideally, we'd just move this whole logic into the ODB source. But that's
not easily possible because we try to avoid the reprepare unless really
required, which is after we have found out that no other ODB source
contains the object, either. So the logic spans across multiple ODB
sources, and consequently we cannot move it into an individual source.

Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the
backend that we already tried to look up the object once, and that this
time around the ODB source should try to find any new objects that may
have surfaced due to an on-disk change.

With this flag, the "files" backend can trivially skip trying to re-read
the object as a loose object. Furthermore, as we know that we only try
the second read via the packfile store, we can skip repreparing loose
objects and only reprepare the packfile store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:45:15 -08:00
Patrick Steinhardt
d9ecf268ef odb: embed base source in the "files" backend
The "files" backend is implemented as a pointer in the `struct
odb_source`. This contradicts our typical pattern for pluggable backends
like we use it for example in the ref store or for object database
streams, where we typically embed the generic base structure in the
specialized implementation. This pattern has a couple of small benefits:

  - We avoid an extra allocation.

  - We hide implementation details in the generic structure.

  - We can easily downcast from a generic backend to the specialized
    structure and vice versa because the offsets are known at compile
    time.

  - It becomes trivial to identify locations where we depend on backend
    specific logic because the cast needs to be explicit.

Refactor our "files" object database source to do the same and embed the
`struct odb_source` in the `struct odb_source_files`.

There are still a bunch of sites in our code base where we do have to
access internals of the "files" backend. The intent is that those will
go away over time, but this will certainly take a while. Meanwhile,
provide a `odb_source_files_downcast()` function that can convert a
generic source into a "files" source.

As we only have a single source the downcast succeeds unconditionally
for now. Eventually though the intent is to make the cast `BUG()` in
case the caller requests to downcast a non-"files" backend to a "files"
backend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:45:15 -08:00
Patrick Steinhardt
cb506a8a69 odb: introduce "files" source
Introduce a new "files" object database source. This source encapsulates
access to both loose object files and the packfile store, similar to how
the "files" backend for refs encapsulates access to loose refs and the
packed-refs file.

Note that for now the "files" source is still a direct member of a
`struct odb_source`. This architecture will be reversed in the next
commit so that the files source contains a `struct odb_source`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05 11:45:14 -08:00
Junio C Hamano
d93be9cbca Merge branch 'ps/fsck-stream-from-the-right-object-instance'
"fsck" iterates over packfiles and its access to pack data caused
the list to be permuted, which caused it to loop forever; the code
to access pack data by "fsck" has been updated to avoid this.

* ps/fsck-stream-from-the-right-object-instance:
  pack-check: fix verification of large objects
  packfile: expose function to read object stream for an offset
  object-file: adapt `stream_object_signature()` to take a stream
  t/helper: improve "genrandom" test helper
2026-03-05 10:04:49 -08:00
Junio C Hamano
9db3644607 Merge branch 'jt/object-file-use-container-of'
Code clean-up.

* jt/object-file-use-container-of:
  object-file.c: avoid container_of() of a NULL container
  object-file: use `container_of()` to convert from base types
2026-03-02 17:06:53 -08:00
Junio C Hamano
10dd04a3fe Merge branch 'ps/object-info-bits-cleanup'
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.

* ps/object-info-bits-cleanup:
  odb: convert `odb_has_object()` flags into an enum
  odb: convert object info flags into an enum
  odb: drop gaps in object info flag values
  builtin/fsck: fix flags passed to `odb_has_object()`
  builtin/backfill: fix flags passed to `odb_has_object()`
2026-03-02 17:06:52 -08:00
Junio C Hamano
9eb5b3b999 Merge branch 'ps/odb-for-each-object'
Revamp object enumeration API around odb.

* ps/odb-for-each-object:
  odb: drop unused `for_each_{loose,packed}_object()` functions
  reachable: convert to use `odb_for_each_object()`
  builtin/pack-objects: use `packfile_store_for_each_object()`
  odb: introduce mtime fields for object info requests
  treewide: drop uses of `for_each_{loose,packed}_object()`
  treewide: enumerate promisor objects via `odb_for_each_object()`
  builtin/fsck: refactor to use `odb_for_each_object()`
  odb: introduce `odb_for_each_object()`
  packfile: introduce function to iterate through objects
  packfile: extract function to iterate through objects of a store
  object-file: introduce function to iterate through objects
  object-file: extract function to read object info from path
  odb: fix flags parameter to be unsigned
  odb: rename `FOR_EACH_OBJECT_*` flags
2026-03-02 17:06:50 -08:00
Junio C Hamano
b1af291b4a Merge branch 'ps/object-info-bits-cleanup' into ps/odb-sources
* ps/object-info-bits-cleanup:
  odb: convert `odb_has_object()` flags into an enum
  odb: convert object info flags into an enum
  odb: drop gaps in object info flag values
  builtin/fsck: fix flags passed to `odb_has_object()`
  builtin/backfill: fix flags passed to `odb_has_object()`
2026-02-23 13:48:48 -08:00
Junio C Hamano
703c97519d Merge branch 'ps/odb-for-each-object' into ps/odb-sources
* ps/odb-for-each-object:
  odb: drop unused `for_each_{loose,packed}_object()` functions
  reachable: convert to use `odb_for_each_object()`
  builtin/pack-objects: use `packfile_store_for_each_object()`
  odb: introduce mtime fields for object info requests
  treewide: drop uses of `for_each_{loose,packed}_object()`
  treewide: enumerate promisor objects via `odb_for_each_object()`
  builtin/fsck: refactor to use `odb_for_each_object()`
  odb: introduce `odb_for_each_object()`
  packfile: introduce function to iterate through objects
  packfile: extract function to iterate through objects of a store
  object-file: introduce function to iterate through objects
  object-file: extract function to read object info from path
  odb: fix flags parameter to be unsigned
  odb: rename `FOR_EACH_OBJECT_*` flags
2026-02-23 13:48:00 -08:00
Patrick Steinhardt
10a6762719 object-file: adapt stream_object_signature() to take a stream
The function `stream_object_signature()` is responsible for verifying
whether the given object ID matches the actual hash of the object's
contents. In contrast to `check_object_signature()` it does so in a
streaming fashion so that we don't have to load the full object into
memory.

In a subsequent commit we'll want to adapt one of its callsites to pass
a preconstructed stream. Prepare for this by accepting a stream as input
that the caller needs to assemble.

While at it, improve the error reporting in `parse_object_with_flags()`
to tell apart the two failure modes.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:19:00 -08:00
Junio C Hamano
8e06f961d7 object-file.c: avoid container_of() of a NULL container
Even though the "struct odb_transaction" member is at the beginning
of the containing "struct odb_transaction_files", i.e., at offset 0,
using container_of() to add offset 0 to a NULL pointer gets flagged
as a bad behaviour under SANITIZE=undefined.

Use container_of_or_null() to work around this issue.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-22 12:19:14 -08:00
Justin Tobler
20daad2db4 object-file: use container_of() to convert from base types
To improve code hygiene, replace direct casts from `struct
odb_transaction` and `struct odb_read_stream` to their concrete
implementations with `container_of()`.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-19 10:12:11 -08:00
Patrick Steinhardt
f6516a5241 odb: convert object info flags into an enum
Convert the object info flags into an enum and adapt all functions that
receive these flags as parameters to use the enum instead of an integer.
This serves two purposes:

  - The function signatures become more self-documenting, as callers
    don't have to wonder which flags they expect.

  - The compiler can warn when a wrong flag type is passed.

Note that the second benefit is somewhat limited. For example, when
or-ing multiple enum flags together the result will be an integer, and
the compiler will not warn about such use cases. But where it does help
is when a single flag of the wrong type is passed, as the compiler would
generate a warning in that case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-12 11:05:08 -08:00
Justin Tobler
3f67e3d021 odb: transparently handle common transaction behavior
A new ODB transaction is created and returned via
`odb_transaction_begin()` and stored in the ODB. Only a single
transaction may be pending at a time. If the ODB already has a
transaction, the function is expected to return NULL. Similarly, when
committing a transaction via `odb_transaction_commit()` the transaction
being committed must match the pending transaction and upon commit reset
the ODB transaction to NULL.

These behaviors apply regardless of the ODB transaction implementation.
Move the corresponding logic into `odb_transaction_{begin,commit}()`
accordingly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02 17:14:03 -08:00
Justin Tobler
fa7d067923 odb: prepare struct odb_transaction to become generic
An ODB transaction handles how objects are stored temporarily and
eventually committed. Due to object storage being implemented
differently for a given ODB source, the ODB transactions must be
implemented in a manner specific to the source the objects are being
written to. To provide generic transactions, `struct odb_transaction` is
updated to store a commit callback that can be configured to support a
specific ODB source. For now `struct odb_transaction_files` is the
only transaction type and what is always returned when starting a
transaction.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02 17:14:03 -08:00
Justin Tobler
8bf06d05a5 object-file: rename transaction functions
In a subsequent commit, ODB transactions are made more generic to
facilitate each ODB source providing its own transaction handling.
Rename `object_file_transaction_{begin,commit}()` to
`odb_transaction_files_{begin,commit}()` to better match the future
source specific transaction implementation.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02 17:14:03 -08:00
Justin Tobler
585e8dfa27 odb: store ODB source in struct odb_transaction
Each `struct odb_transaction` currently stores a reference to the
`struct object_database`. Since transactions are handled per object
source, instead store a reference to the source.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02 17:14:03 -08:00
Patrick Steinhardt
3565faf28c odb: drop unused for_each_{loose,packed}_object() functions
We have converted all callers of `for_each_loose_object()` and
`for_each_packed_object()` to use their new replacement functions
instead. We can thus remove them now.

Do so and inline `packfile_store_for_each_object_internal()` now that it
only has a single callsite again. This makes it a bit easier to follow
the callback indirection that is happening there.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:08 -08:00
Patrick Steinhardt
7b7cbaef27 odb: introduce mtime fields for object info requests
There are some use cases where we need to figure out the mtime for
objects. Most importantly, this is the case when we want to prune
unreachable objects. But getting at that data requires users to manually
derive the info either via the loose object's mtime, the packfiles'
mtime or via the ".mtimes" file.

Introduce a new `struct object_info::mtimep` pointer that allows callers
to request an object's mtime. This new field will be used in a
subsequent commit.

Note that the concept of "mtime" is ambiguous: given an object, it may
be stored multiple times in the object database, and each of these
instances may have a different mtime. Disambiguating these mtimes is
nothing that can happen on the generic ODB layer: the caller may search
for the oldest object, the newest object, or even the relation of object
mtimes depending on the specific source they are located in. As such, it
is the responsibility of the caller to disambiguate mtimes.

A consequence of this is that it's most likely incorrect to look up the
mtime via `odb_read_object_info()`, as this interface does not give us
enough information to disambiguate the mtime. Document this accordingly
and tell users to use `odb_for_each_object()` instead.

Even with this gotcha though it's sensible to have this request as part
of the object info, as the mtime is a property of the object storage
format. If we for example had a "black-box" storage backend, we'd still
need to be able to query it for the mtime info in a generic way.

We could introduce a safety mechanism that for example calls `BUG()` in
case we look up the mtime outside of `odb_for_each_object()`. But that
feels somewhat heavy-handed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:08 -08:00
Patrick Steinhardt
cde615b6f0 object-file: introduce function to iterate through objects
We have multiple divergent interfaces to iterate through objects of a
specific backend:

  - `for_each_loose_object()` yields all loose objects.

  - `for_each_packed_object()` (somewhat obviously) yields all packed
    objects.

These functions have different function signatures, which makes it hard
to create a common abstraction layer that covers both of these.

Introduce a new function `odb_source_loose_for_each_object()` to plug
this gap. This function doesn't take any data specific to loose objects,
but instead it accepts a `struct object_info` that will be populated the
exact same as if `odb_source_loose_read_object()` was called.

The benefit of this new interface is that we can continue to pass
backend-specific data, as `struct object_info` contains a union for
these exact use cases. This will allow us to unify how we iterate
through objects across both loose and packed objects in a subsequent
commit.

The `for_each_loose_object()` function continues to exist for now, but
it will be removed at the end of this patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:07 -08:00
Patrick Steinhardt
6ecab3cdf6 object-file: extract function to read object info from path
Extract a new function that allows us to read object info for a specific
loose object via a user-supplied path. This function will be used in a
subsequent commit.

Note that this also allows us to drop `stat_loose_object()`, which is
a simple wrapper around `odb_loose_path()` plus lstat(3p).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:06 -08:00
Patrick Steinhardt
6358da200f odb: fix flags parameter to be unsigned
The `flags` parameter accepted by various `for_each_object()` functions
is a bitfield of multiple flags. Such parameters are typically unsigned
in the Git codebase, but we use `enum odb_for_each_object_flags` in
some places.

Adapt these function signatures to use the correct type.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:06 -08:00
Patrick Steinhardt
bd1855b897 odb: rename FOR_EACH_OBJECT_* flags
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26 08:26:06 -08:00
Patrick Steinhardt
123e8186a7 object-file: always set OI_LOOSE when reading object info
There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.

The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.

Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.

Note that we also:

  - Rename `status` to `ret` to match our usual coding style, but also
    to show that the old `status` variable is now always getting the
    expected value. Furthermore, the value is not initialized anymore,
    which has the consequence that most compilers will warn for exit
    paths where we forgot to set it.

  - Move the setup of scratch pointers closer to `parse_loose_header()`
    to show where it's needed.

  - Guard a couple of variables on cleanup so that they only get
    released in case they have been set up.

  - Reset `oi->delta_base_oid` towards the end of the function, together
    with all the other object info pointers.

Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:51:14 -08:00
Junio C Hamano
00f117fafe Merge branch 'jc/object-read-stream-fix' into ps/read-object-info-improvements
* jc/object-read-stream-fix:
  odb: do not use "blank" substitute for NULL
2025-12-18 20:55:09 +09:00
Junio C Hamano
a650ad996d odb: do not use "blank" substitute for NULL
When various *object_info() functions are given an extended object
info structure as NULL by a caller that does not want any details,
the code uses a file-scope static blank_oi and passes it down to
the helper functions they use, to avoid handling NULL specifically.

The ps/object-read-stream topic graduated to 'master' recently
however had a bug that assumed that two identically named file-scope
static variables in two functions are the same, which of course is
not the case.  This made "git commit" take 0.38 seconds to 1508
seconds in some case, as reported by Aaron Plattner here:

  https://lore.kernel.org/git/f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com/

We _could_ move the blank_oi variable to the global scope in common
section to fix this regression, but explicitly handling the NULL is
a much safer fix.  It would also reduce the chance of errors that
somebody accidentally writes into blank_oi, making its contents
dirty, which potentially will make subsequent calls into the
function misbehave.  By explicitly handling NULL input, we no longer
have to worry about it.

Reported-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-18 20:48:01 +09:00
Junio C Hamano
dbe54273a7 Merge branch 'ps/object-read-stream'
The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.

* ps/object-read-stream:
  streaming: drop redundant type and size pointers
  streaming: move into object database subsystem
  streaming: refactor interface to be object-database-centric
  streaming: move logic to read packed objects streams into backend
  streaming: move logic to read loose objects streams into backend
  streaming: make the `odb_read_stream` definition public
  streaming: get rid of `the_repository`
  streaming: rely on object sources to create object stream
  packfile: introduce function to read object info from a store
  streaming: move zlib stream into backends
  streaming: create structure for filtered object streams
  streaming: create structure for packed object streams
  streaming: create structure for loose object streams
  streaming: create structure for in-core object streams
  streaming: allocate stream inside the backend-specific logic
  streaming: explicitly pass packfile info when streaming a packed object
  streaming: propagate final object type via the stream
  streaming: drop the `open()` callback function
  streaming: rename `git_istream` into `odb_read_stream`
2025-12-16 11:08:34 +09:00
Junio C Hamano
a545103244 Merge branch 'ps/object-source-loose'
A part of code paths that deals with loose objects has been cleaned
up.

* ps/object-source-loose:
  object-file: refactor writing objects via a stream
  object-file: rename `write_object_file()`
  object-file: refactor freshening of objects
  object-file: rename `has_loose_object()`
  object-file: read objects via the loose object source
  object-file: move loose object map into loose source
  object-file: hide internals when we need to reprepare loose sources
  object-file: move loose object cache into loose source
  object-file: introduce `struct odb_source_loose`
  object-file: move `fetch_if_missing`
  odb: adjust naming to free object sources
  odb: introduce `odb_source_new()`
  odb: fix subtle logic to check whether an alternate is usable
2025-11-24 15:46:41 -08:00
Junio C Hamano
d91d79f26d Merge branch 'bc/submodule-force-same-hash'
Adding a repository that uses a different hash function is a no-no,
but "git submodule add" did nt prevent it, which has been corrected.

* bc/submodule-force-same-hash:
  read-cache: drop submodule check from add_to_cache()
  object-file: disallow adding submodules of different hash algo
2025-11-24 15:46:40 -08:00
Patrick Steinhardt
7b94028652 streaming: drop redundant type and size pointers
In the preceding commits we have turned `struct odb_read_stream` into a
publicly visible structure. Furthermore, this structure now contains the
type and size of the object that we are about to stream. Consequently,
the out-pointers that we used before to propagate the type and size of
the streamed object are now somewhat redundant with the data contained
in the structure itself.

Drop these out-pointers and adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23 12:56:46 -08:00