Many core configuration variables have been migrated from global
variables into 'repo_config_values' to tie them to a specific
repository instance, avoiding cross-repository state leakage.
* ob/more-repo-config-values:
environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values`
environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values`
environment: move "core_sparse_checkout_cone" into `struct repo_config_values`
environment: move "precomposed_unicode" into `struct repo_config_values`
environment: move "pack_compression_level" into `struct repo_config_values`
environment: move `zlib_compression_level` into `struct repo_config_values`
environment: move "check_stat" into `struct repo_config_values`
environment: move "trust_ctime" into `struct repo_config_values`
The `pack_compression_level` configuration is currently stored in the
global variable `pack_compression_level`, which makes it shared across
repository instances within a single process.
Store it instead in `repo_config_values`, where eagerly‑parsed
repository configuration lives. `pack_compression_level` is parsed
eagerly because it influences packfile compression, a core operation
where a lazy parse could cause inconsistent behavior and hamper
libification. This preserves the existing eager‑parsing behavior while
tying the value to the repository from which it was read, avoiding
cross‑repository state leakage and continuing the effort to reduce
reliance on global configuration state.
Update all references to use `repo_config_values()`.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `zlib_compression_level` configuration is currently stored in the
global variable `zlib_compression_level`, which makes it shared across
repository instances within a single process.
Store it instead in `repo_config_values`, where eagerly‑parsed
repository configuration lives. `zlib_compression_level` is parsed
eagerly because it determines compression behaviour for objects and
packs – core operations where a lazy parse could lead to unpredictable
results and hinder libification. This preserves the existing
eager‑parsing behavior while tying the value to the repository it
was read from, avoiding cross‑repository state leakage and continuing
the effort to reduce reliance on global configuration state.
Update all references to use `repo_config_values()`.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "object-file" subsystem still hosts the majority of logic used to
write loose objects. Eventually, we'll want to move this logic into
"odb/source-loose.c", but this isn't yet easily possible because a lot
of the writing logic is still being shared with `force_object_loose()`.
We will eventually detangle this logic so that we can indeed move all of
it into the "loose" source. Meanwhile though, refactor the code so that
it operates on a `struct odb_source_loose` directly to already make the
dependency explicit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_write_object()` from "object-file.c" into
"odb/source-loose.c" and wire it up as the `write_object()` callback of
the loose source.
As in preceding commits, this requires us to expose a couple of generic
functions from "object-file.c" as they are used in both subsystems now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the loose object map functions in "loose.c" accept a generic
`struct odb_source *`, they always expect this to be the "files"
backend. Furthermore, the subsystem doesn't even care about the "files"
backend, but only uses it as a stepping stone to get to the "loose"
backend.
This assumption is implicit and thus not immediately obvious. Refactor
the interfaces to instead operate on a `struct odb_source_loose`
instead, which eliminates the implicit dependency and unnecessary detour
via the "files" source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_freshen_object()` from "object-file.c" into
"odb/source-loose.c" and wire it up as the `freshen_object()` callback
of the loose source.
As part of the move, `check_and_freshen_source()` is inlined into the
callback function, as it has no other callers anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_source_loose_has_object()` checks whether a specific
object exists as a loose object on disk by using lstat(3p). This
interface is somewhat redundant, as we typically check for object
existence in a generic way via `odb_source_read_object_info()`.
In fact, these two calls are redundant in case the latter is called in a
specific way: when called without an object info request and without the
`OBJECT_INFO_QUICK` flag, then we will end up doing the same call to
lstat(3p) in `read_object_info_from_path()`.
Drop the function and adapt callers to instead use the generic
interface so that its calling conventions align with that of other
sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_count_objects()` and its associated helpers from
"object-file.c" into "odb/source-loose.c" and wire it up as the
`count_objects()` callback of the loose source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_find_abbrev_len()` and its associated helpers
from "object-file.c" into "odb/source-loose.c" and wire it up as the
`find_abbrev_len` callback of the loose source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_for_each_object()` and its associated helpers
from "object-file.c" into "odb/source-loose.c" and wire it up as the
`for_each_object()` callback of the loose source.
Again, as in the preceding commit, we are forced to expose a couple of
functions from "object-file.c" that are now used by both subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_read_object_stream()` and its associated helpers
from "object-file.c" into "odb/source-loose.c" and wire it up as the
`read_object_stream()` callback of the loose source.
As part of the move we are also forced to expose a couple of functions
from "object-file.h" that parse object headers in a somewhat-generic
way, as those functions are now used by both subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_read_object_info()` from "object-file.c" into
"odb/source-loose.c" and wire it up as the `read_object_info()` callback
of the loose source. Callers that previously invoked it directly now go
through the generic `odb_source_read_object_info()` interface instead.
The function `read_object_info_from_path()` cannot be moved along with
it because it is still called by `for_each_object_wrapper_cb()`. It is
therefore kept in place, but adjusted to take a loose source to clarify
that it's always operating on this structure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move `odb_source_loose_reprepare()` from "object-file.c" into
"odb/source-loose.c" and wire it up as the `reprepare()` callback of the
loose source.
While at it, make `odb_source_loose_clear_cache()` static, as it is no
longer needed outside of its file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start converting `struct odb_source_loose` into a proper pluggable
`struct odb_source` by embedding the base struct and assigning it the
new `ODB_SOURCE_LOOSE` type. Furthermore, wire up lifecycle management
of this source by implementing the `free` callback and taking ownership
of the chdir notifications.
Note that the loose source is not yet functional as a standalone `struct
odb_source`, as it's missing all of the callback implementations. These
will be wired up in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct odb_source_loose` holds a pointer to its owning parent
source. The way that Git is currently structured, this parent is always
the "files" source. In subsequent commits we're going to detangle that
so that the "loose" source doesn't have any owning parent source at all
so that it can be used as a completely standalone source.
Detangling this mess is somewhat intricate though, and is made even more
intricate because it's not always clear which kind of source one is
holding at a specific point in time -- either the parent "files" source,
or the child "loose" source.
Make this relationship more explicit by storing a pointer to the "files"
source instead of storing a pointer to a generic `struct odb_source`.
This will help make subsequent steps a bit clearer.
Note that this is a temporary step, only. At the end of this series
we will have dropped the parent pointer completely.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent patches we'll be turning `struct odb_source_loose` into a
proper `struct odb_source`. As a first step towards this goal, move its
struct out of "object-file.c" and into "odb/source-loose.c".
This detaches the implementation of the loose object source from the
generic object file code, following the same convention already used by
the "files" and "in-memory" sources.
No functional changes are intended.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The oidtree data structure is currently only used to store object IDs,
without any associated data. So consequently, it can only really be used
to track which object IDs exist, and we can use the tree structure to
efficiently operate on OID prefixes.
But there are valid use cases where we want to both:
- Store object IDs in a sorted order.
- Associated arbitrary data with them.
Refactor the oidtree interface so that it allows us to store arbitrary
payloads within the respective nodes. This will be used in the next
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
How an ODB transaction handles writing objects is expected to vary
between implementations. Introduce a new `write_object_stream()`
callback in `struct odb_transaction` to make this function pluggable.
Rename `index_blob_packfile_transaction()` to
`odb_transaction_files_write_object_stream()` and wire it up for use
with `struct odb_transaction_files` accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_blob_packfile_transaction()` function streams blob data
directly from an fd. This makes it difficult to reuse as part of a
generic transactional object writing interface.
Refactor the packfile write path to operate on a `struct
odb_write_stream`, allowing callers to supply data from arbitrary
sources.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In certain scenarios, Git handles writing blobs that exceed
"core.bigFileThreshold" differently by streaming the object directly
into a packfile. When there is an active ODB transaction, these blobs
are streamed to the same packfile instead of using a separate packfile
for each. If "pack.packSizeLimit" is configured and streaming another
object causes the packfile to exceed the configured limit, the packfile
is truncated back to the previous object and the object write is
restarted in a new packfile.
This works fine, but requires the fd being read from to save a
checkpoint so it becomes possible to rewind the input source via seeking
back to a known offset at the beginning. In a subsequent commit, blob
streaming is converted to use `struct odb_write_stream` as a more
generic input source instead of an fd which doesn't provide a mechanism
for rewinding.
For this use case though, rewinding the fd is not strictly necessary
because the inflated size of the object is known and can be used to
approximate whether writing the object would cause the packfile to
exceed the configured limit prior to writing anything. These blobs
written to the packfile are never deltified thus the size difference
between what is written versus the inflated size is due to zlib
compression. While this does prevent packfiles from being filled to the
potential maximum is some cases, it should be good enough and still
prevents the packfile from exceeding any configured limit.
Use the inflated blob size to determine whether writing an object to a
packfile will exceed the configured "pack.packSizeLimit".
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_blob_packfile_transaction()` function handles streaming a
blob from an fd to compute its object ID and conditionally writes the
object directly to a packfile if the INDEX_WRITE_OBJECT flag is set. A
subsequent commit will make these packfile object writes part of the
transaction interface. Consequently, having the object write be
conditional on this flag is a bit awkward.
In preparation for this change, introduce a dedicated
`hash_blob_stream()` helper that only computes the OID from a `struct
odb_write_stream`. This is invoked by `index_fd()` instead when the
INDEX_WRITE_OBJECT is not set. The object write performed via
`index_blob_packfile_transaction()` is made unconditional accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `read()` callback used by `struct odb_write_stream` currently
returns a pointer to an internal buffer along with the number of bytes
read. This makes buffer ownership unclear and provides no way to report
errors.
Update the interface to instead require the caller to provide a buffer,
and have the callback return the number of bytes written to it or a
negative value on error. While at it, also move the `struct
odb_write_stream` definition to "odb/streaming.h". Call sites are
updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current ODB transaction interface is colocated with other ODB
interfaces in "odb.{c,h}". Subsequent commits will expand `struct
odb_transaction` to support write operations on the transaction
directly. To keep things organized and prevent "odb.{c,h}" from becoming
more unwieldy, split out `struct odb_transaction` into a separate
header.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The odb_read_stream structure uses unsigned long for the size field,
which is 32-bit on Windows even in 64-bit builds. When streaming
objects larger than 4GB, the size would be truncated to zero or an
incorrect value, resulting in empty files being written to disk.
Change the size field in odb_read_stream to size_t and introduce
unpack_object_header_sz() to return sizes via size_t pointer. Since
object_info.sizep remains unsigned long for API compatibility, use
temporary variables where the types differ, with comments noting the
truncation limitation for code paths that still use unsigned long.
Widening the producers to size_t in this way introduces a handful of
silent size_t -> unsigned long narrowings on Windows, all in
builtin/pack-objects.c, where the consumers are still typed
unsigned long. Make those narrowings explicit with
cast_size_t_to_ulong() so they assert loudly the moment an object
actually exceeds ULONG_MAX bytes:
- oe_get_size_slow() returns unsigned long but holds a size_t
locally; cast at the return.
- write_reuse_object() passes a size_t into check_pack_inflate(),
whose expect parameter is unsigned long; cast at the call.
- check_object() routes a size_t through SET_SIZE() and
SET_DELTA_SIZE(), both of which take unsigned long via
oe_set_size() / oe_set_delta_size(); cast at the three call
sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the
non-delta default arm.
The cast-only treatment is deliberately a stop-gap. Properly
widening oe_set_size, oe_get_size_slow's return type,
check_pack_inflate's expect parameter, object_info.sizep,
patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series
that is too large to be reviewable, so the proper widening is
deferred to a follow-up topic. Until then,
cast_size_t_to_ulong() at least makes the truncation explicit at
the source: it documents the boundary, and on a 64-bit non-Windows
platform it is a no-op.
This was originally authored by LordKiRon <https://github.com/LordKiRon>,
who preferred not to reveal their real name and therefore agreed that I
take over authorship.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, zlib's `uLong` type is 32-bit even on 64-bit systems. When
processing data streams larger than 4GB, the `total_in` and `total_out`
fields in zlib's `z_stream` structure wrap around, which caused the
sanity checks in `zlib_post_call()` to trigger `BUG()` assertions.
The git_zstream wrapper now tracks its own 64-bit totals rather than
copying them from zlib. The sanity checks compare only the low bits,
using `maximum_unsigned_value_of_type(uLong)` to mask appropriately for
the platform's `uLong` size.
This is based on work by LordKiRon in git-for-windows#6076.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During Git 2.52 timeframe, we broke streaming computation of object
hash outside a repository, which has been corrected.
* jt/index-fd-wo-repo-regression-fix-maint:
object-file: avoid ODB transaction when not writing objects
Various code clean-up around odb subsystem.
* ps/odb-cleanup:
odb: drop unneeded headers and forward decls
odb: rename `odb_has_object()` flags
odb: use enum for `odb_write_object` flags
odb: rename `odb_write_object()` flags
treewide: use enum for `odb_for_each_object()` flags
CodingGuidelines: document our style for flags
In ce1661f9da (odb: add transaction interface, 2025-09-16), existing
ODB transaction logic is adapted to create a transaction interface
at the ODB layer. The intent here is for the ODB transaction
interface to eventually provide an object source agnostic means to
manage transactions.
An unintended consequence of this change though is that
`object-file.c:index_fd()` may enter the ODB transaction path even
when no object write is requested. In non-repository contexts, this
can result in a NULL dereference and segfault. One such case occurs
when running git-diff(1) outside of a repository with
"core.bigFileThreshold" forcing the streaming path in `index_fd()`:
$ echo foo >foo
$ echo bar >bar
$ git -c core.bigFileThreshold=1 diff -- foo bar
In this scenario, the caller only needs to compute the object ID. Object
hashing does not require an ODB, so starting a transaction is both
unnecessary and invalid.
Fix the bug by avoiding the use of ODB transactions in `index_fd()` when
callers are only interested in computing the object hash.
Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
[jc: adjusted to fd13909e (Merge branch 'jt/odb-transaction', 2025-10-02)]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Internals of "git fsck" have been refactored to not depend on the
global `the_repository` variable.
* ps/fsck-wo-the-repository:
builtin/fsck: stop using `the_repository` in error reporting
builtin/fsck: stop using `the_repository` when marking objects
builtin/fsck: stop using `the_repository` when checking packed objects
builtin/fsck: stop using `the_repository` with loose objects
builtin/fsck: stop using `the_repository` when checking reflogs
builtin/fsck: stop using `the_repository` when checking refs
builtin/fsck: stop using `the_repository` when snapshotting refs
builtin/fsck: fix trivial dependence on `the_repository`
fsck: drop USE_THE_REPOSITORY
fsck: store repository in fsck options
fsck: initialize fsck options via a function
fetch-pack: move fsck options into function scope
Rename `odb_has_object()` flags to be properly prefixed with the
function name.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a couple of functions that accept `odb_write_object()` flags,
but all of them accept the flags as an `unsigned` integer. In fact, we
don't even have an `enum` for the flags field.
Introduce this `enum` and adapt functions accordingly according to our
coding style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `odb_write_object()` flags to be properly prefixed with the
function name.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to count objects has been cleaned up.
* ps/object-counting:
odb: introduce generic object counting
odb/source: introduce generic object counting
object-file: generalize counting objects
object-file: extract logic to approximate object count
packfile: extract logic to count number of objects
odb: stop including "odb/source.h"
The fsck subsystem relies on `the_repository` quite a bit. While we
could of course explicitly pass a repository down the callchain, we
already have a `struct fsck_options` that we pass to almost all
functions.
Extend the options to also store the repository to make it readily
available.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We initialize the `struct fsck_options` via a set of macros, often in
global scope. In the next commit though we're about to introduce a new
repository field to the options that must be initialized, and naturally
we don't have a repo other than `the_repository` available in this
scope.
Refactor the code to instead intrdouce a new `fsck_options_init()`
function that initializes the options for us and move initialization
into function scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `repo_find_unique_abbrev_r()` takes as input an object ID
as well as a minimum object ID length and returns the minimum required
prefix to make the object ID unique.
The logic that computes the abbreviation length for loose objects is
deeply tied to the loose object storage format. As such, it would fail
in case a different object storage format was used.
Prepare for making this logic generic to the backend by moving the logic
into a new `odb_source_loose_find_abbrev_len()` function that is part of
"object-file.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to iterate through loose objects that have a certain prefix is
currently hosted in "object-name.c". This logic reaches into specifics
of the loose object source, so it breaks once a different backend is
used for the object storage.
Move the logic to iterate through loose objects with a prefix into
"object-file.c". This is done by extending the for-each-object options
to support an optional prefix that is then honored by the loose source.
Naturally, we'll also have this support in the packfile store. This is
done in the next commit.
Furthermore, there are no users of the loose cache outside of
"object-file.c" anymore. As such, convert `odb_source_loose_cache()` to
have file scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `odb_for_each_object()` function only accepts a bitset of flags. In
a subsequent commit we'll want to change object iteration to also
support iterating over only those objects that have a specific prefix.
While we could of course add the prefix to the function signature, or
alternatively introduce a new function, both of these options don't
really seem to be that sensible.
Instead, introduce a new `struct odb_for_each_object_options` that can
be passed to a new `odb_for_each_object_ext()` function. Splice through
the options structure into the respective object database sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plug a few leaks where mmap'ed memory regions are not unmapped.
* jk/unleak-mmap:
meson: turn on NO_MMAP when building with LSan
Makefile: turn on NO_MMAP when building with LSan
object-file: fix mmap() leak in odb_source_loose_read_object_stream()
pack-revindex: avoid double-loading .rev files
check_connected(): fix leak of pack-index mmap
check_connected(): delay opening new_pack
The object source API is getting restructured to allow plugging new
backends.
* ps/odb-sources:
odb/source: make `begin_transaction()` function pluggable
odb/source: make `write_alternate()` function pluggable
odb/source: make `read_alternates()` function pluggable
odb/source: make `write_object_stream()` function pluggable
odb/source: make `write_object()` function pluggable
odb/source: make `freshen_object()` function pluggable
odb/source: make `for_each_object()` function pluggable
odb/source: make `read_object_stream()` function pluggable
odb/source: make `read_object_info()` function pluggable
odb/source: make `close()` function pluggable
odb/source: make `reprepare()` function pluggable
odb/source: make `free()` function pluggable
odb/source: introduce source type for robustness
odb: move reparenting logic into respective subsystems
odb: embed base source in the "files" backend
odb: introduce "files" source
odb: split `struct odb_source` into separate header
Generalize the function introduced in the preceding commit to not only
be able to approximate the number of loose objects, but to also provide
an accurate count. The behaviour can be toggled via a new flag.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "builtin/gc.c" we have some logic that checks whether we need to
repack objects. This is done by counting the number of objects that we
have and checking whether it exceeds a certain threshold. We don't
really need an accurate object count though, which is why we only
open a single object directory shard and then extrapolate from there.
Extract this logic into a new function that is owned by the loose object
database source. This is done to prepare for a subsequent change, where
we'll introduce object counting on the object database source level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/odb-sources:
odb/source: make `begin_transaction()` function pluggable
odb/source: make `write_alternate()` function pluggable
odb/source: make `read_alternates()` function pluggable
odb/source: make `write_object_stream()` function pluggable
odb/source: make `write_object()` function pluggable
odb/source: make `freshen_object()` function pluggable
odb/source: make `for_each_object()` function pluggable
odb/source: make `read_object_stream()` function pluggable
odb/source: make `read_object_info()` function pluggable
odb/source: make `close()` function pluggable
odb/source: make `reprepare()` function pluggable
odb/source: make `free()` function pluggable
odb/source: introduce source type for robustness
odb: move reparenting logic into respective subsystems
odb: embed base source in the "files" backend
odb: introduce "files" source
odb: split `struct odb_source` into separate header
We mmap() a loose object file, storing the result in the local variable
"mapped", which is eventually assigned into our stream struct as
"st.mapped". If we hit an error, we jump to an error label which does:
munmap(st.mapped, st.mapsize);
to clean up. But this is wrong; we don't assign st.mapped until the end
of the function, after all of the "goto error" jumps. So this munmap()
is never cleaning up anything (st.mapped is always NULL, because we
initialize the struct with calloc).
Instead, we should feed the local variable to munmap().
This leak is due to 595296e124 (streaming: allocate stream inside the
backend-specific logic, 2025-11-23), which introduced the local
variable. Before that, we assigned the mmap result directly into
st.mapped. It was probably switched there so that we do not have to
allocate/free the struct when the map operation fails (e.g., because we
don't have the loose object). Before that commit, the struct was passed
in from the caller, so there was no allocation at all.
You can see the leak in the test suite by building with:
make SANITIZE=leak NO_MMAP=1 CC=clang
and running t1060. We need NO_MMAP so that the mmap() is backed by an
actual malloc(), which allows LSan to detect it. And the leak seems not
to be detected when compiling with gcc, probably due to some internal
compiler decisions about how the stack memory is written.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new callback function in `struct odb_source` to make the
function pluggable.
Note that this function is a bit less straight-forward to convert
compared to the other functions. The reason here is that the logic to
read an object is:
1. We try to read the object. If it exists we return it.
2. If the object does not exist we reprepare the object database
source.
3. We then try reading the object info a second time in case the
reprepare caused it to appear.
The second read is only supposed to happen for the packfile store
though, as reading loose objects is not impacted by repreparing the
object database.
Ideally, we'd just move this whole logic into the ODB source. But that's
not easily possible because we try to avoid the reprepare unless really
required, which is after we have found out that no other ODB source
contains the object, either. So the logic spans across multiple ODB
sources, and consequently we cannot move it into an individual source.
Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the
backend that we already tried to look up the object once, and that this
time around the ODB source should try to find any new objects that may
have surfaced due to an on-disk change.
With this flag, the "files" backend can trivially skip trying to re-read
the object as a loose object. Furthermore, as we know that we only try
the second read via the packfile store, we can skip repreparing loose
objects and only reprepare the packfile store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>