git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-06-11 08:30:32 -05:00

Author	SHA1	Message	Date
Junio C Hamano	3d0b057aee	Merge branch 'ob/more-repo-config-values' into next Many core configuration variables have been migrated from global variables into 'repo_config_values' to tie them to a specific repository instance, avoiding cross-repository state leakage. * ob/more-repo-config-values: environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values` environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values` environment: move "core_sparse_checkout_cone" into `struct repo_config_values` environment: move "precomposed_unicode" into `struct repo_config_values` environment: move "pack_compression_level" into `struct repo_config_values` environment: move `zlib_compression_level` into `struct repo_config_values` environment: move "check_stat" into `struct repo_config_values` environment: move "trust_ctime" into `struct repo_config_values`	2026-06-09 10:11:13 +09:00
Olamide Caleb Bello	8cd7402acc	environment: move "pack_compression_level" into `struct repo_config_values` The `pack_compression_level` configuration is currently stored in the global variable `pack_compression_level`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `pack_compression_level` is parsed eagerly because it influences packfile compression, a core operation where a lazy parse could cause inconsistent behavior and hamper libification. This preserves the existing eager‑parsing behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-03 08:36:48 +09:00
Olamide Caleb Bello	e0f86540ab	environment: move `zlib_compression_level` into `struct repo_config_values` The `zlib_compression_level` configuration is currently stored in the global variable `zlib_compression_level`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `zlib_compression_level` is parsed eagerly because it determines compression behaviour for objects and packs – core operations where a lazy parse could lead to unpredictable results and hinder libification. This preserves the existing eager‑parsing behavior while tying the value to the repository it was read from, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-03 08:36:48 +09:00
Patrick Steinhardt	b9906a645c	object-file: refactor writing objects to use loose source The "object-file" subsystem still hosts the majority of logic used to write loose objects. Eventually, we'll want to move this logic into "odb/source-loose.c", but this isn't yet easily possible because a lot of the writing logic is still being shared with `force_object_loose()`. We will eventually detangle this logic so that we can indeed move all of it into the "loose" source. Meanwhile though, refactor the code so that it operates on a `struct odb_source_loose` directly to already make the dependency explicit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	04a6e84cbd	odb/source-loose: wire up `write_object()` callback Move `odb_source_loose_write_object()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `write_object()` callback of the loose source. As in preceding commits, this requires us to expose a couple of generic functions from "object-file.c" as they are used in both subsystems now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	87588db131	loose: refactor object map to operate on `struct odb_source_loose` While the loose object map functions in "loose.c" accept a generic `struct odb_source *`, they always expect this to be the "files" backend. Furthermore, the subsystem doesn't even care about the "files" backend, but only uses it as a stepping stone to get to the "loose" backend. This assumption is implicit and thus not immediately obvious. Refactor the interfaces to instead operate on a `struct odb_source_loose` instead, which eliminates the implicit dependency and unnecessary detour via the "files" source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	d8b9e8bb23	odb/source-loose: wire up `freshen_object()` callback Move `odb_source_loose_freshen_object()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `freshen_object()` callback of the loose source. As part of the move, `check_and_freshen_source()` is inlined into the callback function, as it has no other callers anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	86f7ab5a1f	odb/source-loose: drop `odb_source_loose_has_object()` The function `odb_source_loose_has_object()` checks whether a specific object exists as a loose object on disk by using lstat(3p). This interface is somewhat redundant, as we typically check for object existence in a generic way via `odb_source_read_object_info()`. In fact, these two calls are redundant in case the latter is called in a specific way: when called without an object info request and without the `OBJECT_INFO_QUICK` flag, then we will end up doing the same call to lstat(3p) in `read_object_info_from_path()`. Drop the function and adapt callers to instead use the generic interface so that its calling conventions align with that of other sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	2ade08ac29	odb/source-loose: wire up `count_objects()` callback Move `odb_source_loose_count_objects()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `count_objects()` callback of the loose source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	8a6da81cc1	odb/source-loose: wire up `find_abbrev_len()` callback Move `odb_source_loose_find_abbrev_len()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `find_abbrev_len` callback of the loose source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	e4f1d9ba57	odb/source-loose: wire up `for_each_object()` callback Move `odb_source_loose_for_each_object()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `for_each_object()` callback of the loose source. Again, as in the preceding commit, we are forced to expose a couple of functions from "object-file.c" that are now used by both subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	727a935a71	odb/source-loose: wire up `read_object_stream()` callback Move `odb_source_loose_read_object_stream()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `read_object_stream()` callback of the loose source. As part of the move we are also forced to expose a couple of functions from "object-file.h" that parse object headers in a somewhat-generic way, as those functions are now used by both subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	584338ed92	odb/source-loose: wire up `read_object_info()` callback Move `odb_source_loose_read_object_info()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `read_object_info()` callback of the loose source. Callers that previously invoked it directly now go through the generic `odb_source_read_object_info()` interface instead. The function `read_object_info_from_path()` cannot be moved along with it because it is still called by `for_each_object_wrapper_cb()`. It is therefore kept in place, but adjusted to take a loose source to clarify that it's always operating on this structure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	a2b7db9bc8	odb/source-loose: wire up `reprepare()` callback Move `odb_source_loose_reprepare()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `reprepare()` callback of the loose source. While at it, make `odb_source_loose_clear_cache()` static, as it is no longer needed outside of its file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	ead691927b	odb/source-loose: start converting to a proper `struct odb_source` Start converting `struct odb_source_loose` into a proper pluggable `struct odb_source` by embedding the base struct and assigning it the new `ODB_SOURCE_LOOSE` type. Furthermore, wire up lifecycle management of this source by implementing the `free` callback and taking ownership of the chdir notifications. Note that the loose source is not yet functional as a standalone `struct odb_source`, as it's missing all of the callback implementations. These will be wired up in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	1d451ba6fe	odb/source-loose: store pointer to "files" instead of generic source The `struct odb_source_loose` holds a pointer to its owning parent source. The way that Git is currently structured, this parent is always the "files" source. In subsequent commits we're going to detangle that so that the "loose" source doesn't have any owning parent source at all so that it can be used as a completely standalone source. Detangling this mess is somewhat intricate though, and is made even more intricate because it's not always clear which kind of source one is holding at a specific point in time -- either the parent "files" source, or the child "loose" source. Make this relationship more explicit by storing a pointer to the "files" source instead of storing a pointer to a generic `struct odb_source`. This will help make subsequent steps a bit clearer. Note that this is a temporary step, only. At the end of this series we will have dropped the parent pointer completely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	514f039c90	odb/source-loose: move loose source into "odb/" subsystem In subsequent patches we'll be turning `struct odb_source_loose` into a proper `struct odb_source`. As a first step towards this goal, move its struct out of "object-file.c" and into "odb/source-loose.c". This detaches the implementation of the loose object source from the generic object file code, following the same convention already used by the "files" and "in-memory" sources. No functional changes are intended. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Junio C Hamano	9ebc19b760	Merge branch 'ps/odb-in-memory' into ps/odb-source-loose * ps/odb-in-memory: (24 commits) t/unit-tests: add tests for the in-memory object source odb: generic in-memory source odb/source-inmemory: stub out remaining functions odb/source-inmemory: implement `freshen_object()` callback odb/source-inmemory: implement `count_objects()` callback odb/source-inmemory: implement `find_abbrev_len()` callback odb/source-inmemory: implement `for_each_object()` callback odb/source-inmemory: convert to use oidtree oidtree: add ability to store data cbtree: allow using arbitrary wrapper structures for nodes odb/source-inmemory: implement `write_object_stream()` callback odb/source-inmemory: implement `write_object()` callback odb/source-inmemory: implement `read_object_stream()` callback odb/source-inmemory: implement `read_object_info()` callback odb: fix unnecessary call to `find_cached_object()` odb/source-inmemory: implement `free()` callback odb: introduce "in-memory" source odb/transaction: make `write_object_stream()` pluggable object-file: generalize packfile writes to use odb_write_stream object-file: avoid fd seekback by checking object size upfront ...	2026-05-21 22:34:55 +09:00
Patrick Steinhardt	449650decf	oidtree: add ability to store data The oidtree data structure is currently only used to store object IDs, without any associated data. So consequently, it can only really be used to track which object IDs exist, and we can use the tree structure to efficiently operate on OID prefixes. But there are valid use cases where we want to both: - Store object IDs in a sorted order. - Associated arbitrary data with them. Refactor the oidtree interface so that it allows us to store arbitrary payloads within the respective nodes. This will be used in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:50:45 +09:00
Junio C Hamano	2f124686e8	Merge branch 'jt/odb-transaction-write' into ps/odb-in-memory * jt/odb-transaction-write: odb/transaction: make `write_object_stream()` pluggable object-file: generalize packfile writes to use odb_write_stream object-file: avoid fd seekback by checking object size upfront object-file: remove flags from transaction packfile writes odb: update `struct odb_write_stream` read() callback odb/transaction: use pluggable `begin_transaction()` odb: split `struct odb_transaction` into separate header	2026-05-15 04:50:31 +09:00
Justin Tobler	08b6afb2a2	odb/transaction: make `write_object_stream()` pluggable How an ODB transaction handles writing objects is expected to vary between implementations. Introduce a new `write_object_stream()` callback in `struct odb_transaction` to make this function pluggable. Rename `index_blob_packfile_transaction()` to `odb_transaction_files_write_object_stream()` and wire it up for use with `struct odb_transaction_files` accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	45a75d6187	object-file: generalize packfile writes to use odb_write_stream The `index_blob_packfile_transaction()` function streams blob data directly from an fd. This makes it difficult to reuse as part of a generic transactional object writing interface. Refactor the packfile write path to operate on a `struct odb_write_stream`, allowing callers to supply data from arbitrary sources. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	d4c92e2ac9	object-file: avoid fd seekback by checking object size upfront In certain scenarios, Git handles writing blobs that exceed "core.bigFileThreshold" differently by streaming the object directly into a packfile. When there is an active ODB transaction, these blobs are streamed to the same packfile instead of using a separate packfile for each. If "pack.packSizeLimit" is configured and streaming another object causes the packfile to exceed the configured limit, the packfile is truncated back to the previous object and the object write is restarted in a new packfile. This works fine, but requires the fd being read from to save a checkpoint so it becomes possible to rewind the input source via seeking back to a known offset at the beginning. In a subsequent commit, blob streaming is converted to use `struct odb_write_stream` as a more generic input source instead of an fd which doesn't provide a mechanism for rewinding. For this use case though, rewinding the fd is not strictly necessary because the inflated size of the object is known and can be used to approximate whether writing the object would cause the packfile to exceed the configured limit prior to writing anything. These blobs written to the packfile are never deltified thus the size difference between what is written versus the inflated size is due to zlib compression. While this does prevent packfiles from being filled to the potential maximum is some cases, it should be good enough and still prevents the packfile from exceeding any configured limit. Use the inflated blob size to determine whether writing an object to a packfile will exceed the configured "pack.packSizeLimit". Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	8a1f5ecf28	object-file: remove flags from transaction packfile writes The `index_blob_packfile_transaction()` function handles streaming a blob from an fd to compute its object ID and conditionally writes the object directly to a packfile if the INDEX_WRITE_OBJECT flag is set. A subsequent commit will make these packfile object writes part of the transaction interface. Consequently, having the object write be conditional on this flag is a bit awkward. In preparation for this change, introduce a dedicated `hash_blob_stream()` helper that only computes the OID from a `struct odb_write_stream`. This is invoked by `index_fd()` instead when the INDEX_WRITE_OBJECT is not set. The object write performed via `index_blob_packfile_transaction()` is made unconditional accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	970f63519e	odb: update `struct odb_write_stream` read() callback The `read()` callback used by `struct odb_write_stream` currently returns a pointer to an internal buffer along with the number of bytes read. This makes buffer ownership unclear and provides no way to report errors. Update the interface to instead require the caller to provide a buffer, and have the callback return the number of bytes written to it or a negative value on error. While at it, also move the `struct odb_write_stream` definition to "odb/streaming.h". Call sites are updated accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	5f6744d3eb	odb: split `struct odb_transaction` into separate header The current ODB transaction interface is colocated with other ODB interfaces in "odb.{c,h}". Subsequent commits will expand `struct odb_transaction` to support write operations on the transaction directly. To keep things organized and prevent "odb.{c,h}" from becoming more unwieldy, split out `struct odb_transaction` into a separate header. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:39 +09:00
Johannes Schindelin	606c192380	odb, packfile: use size_t for streaming object sizes The odb_read_stream structure uses unsigned long for the size field, which is 32-bit on Windows even in 64-bit builds. When streaming objects larger than 4GB, the size would be truncated to zero or an incorrect value, resulting in empty files being written to disk. Change the size field in odb_read_stream to size_t and introduce unpack_object_header_sz() to return sizes via size_t pointer. Since object_info.sizep remains unsigned long for API compatibility, use temporary variables where the types differ, with comments noting the truncation limitation for code paths that still use unsigned long. Widening the producers to size_t in this way introduces a handful of silent size_t -> unsigned long narrowings on Windows, all in builtin/pack-objects.c, where the consumers are still typed unsigned long. Make those narrowings explicit with cast_size_t_to_ulong() so they assert loudly the moment an object actually exceeds ULONG_MAX bytes: - oe_get_size_slow() returns unsigned long but holds a size_t locally; cast at the return. - write_reuse_object() passes a size_t into check_pack_inflate(), whose expect parameter is unsigned long; cast at the call. - check_object() routes a size_t through SET_SIZE() and SET_DELTA_SIZE(), both of which take unsigned long via oe_set_size() / oe_set_delta_size(); cast at the three call sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the non-delta default arm. The cast-only treatment is deliberately a stop-gap. Properly widening oe_set_size, oe_get_size_slow's return type, check_pack_inflate's expect parameter, object_info.sizep, patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series that is too large to be reviewable, so the proper widening is deferred to a follow-up topic. Until then, cast_size_t_to_ulong() at least makes the truncation explicit at the source: it documents the boundary, and on a 64-bit non-Windows platform it is a no-op. This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:31 +09:00
Johannes Schindelin	d05d666977	git-zlib: handle data streams larger than 4GB On Windows, zlib's `uLong` type is 32-bit even on 64-bit systems. When processing data streams larger than 4GB, the `total_in` and `total_out` fields in zlib's `z_stream` structure wrap around, which caused the sanity checks in `zlib_post_call()` to trigger `BUG()` assertions. The git_zstream wrapper now tracks its own 64-bit totals rather than copying them from zlib. The sanity checks compare only the low bits, using `maximum_unsigned_value_of_type(uLong)` to mask appropriately for the platform's `uLong` size. This is based on work by LordKiRon in git-for-windows#6076. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:31 +09:00
Junio C Hamano	fe4ab2e698	Merge branch 'jt/index-fd-wo-repo-regression-fix-maint' During Git 2.52 timeframe, we broke streaming computation of object hash outside a repository, which has been corrected. * jt/index-fd-wo-repo-regression-fix-maint: object-file: avoid ODB transaction when not writing objects	2026-04-08 10:20:51 -07:00
Junio C Hamano	9797fed6ce	Merge branch 'ps/odb-cleanup' Various code clean-up around odb subsystem. * ps/odb-cleanup: odb: drop unneeded headers and forward decls odb: rename `odb_has_object()` flags odb: use enum for `odb_write_object` flags odb: rename `odb_write_object()` flags treewide: use enum for `odb_for_each_object()` flags CodingGuidelines: document our style for flags	2026-04-08 10:19:17 -07:00
Justin Tobler	7d8727ff0b	object-file: avoid ODB transaction when not writing objects In `ce1661f9da` (odb: add transaction interface, 2025-09-16), existing ODB transaction logic is adapted to create a transaction interface at the ODB layer. The intent here is for the ODB transaction interface to eventually provide an object source agnostic means to manage transactions. An unintended consequence of this change though is that `object-file.c:index_fd()` may enter the ODB transaction path even when no object write is requested. In non-repository contexts, this can result in a NULL dereference and segfault. One such case occurs when running git-diff(1) outside of a repository with "core.bigFileThreshold" forcing the streaming path in `index_fd()`: $ echo foo >foo $ echo bar >bar $ git -c core.bigFileThreshold=1 diff -- foo bar In this scenario, the caller only needs to compute the object ID. Object hashing does not require an ODB, so starting a transaction is both unnecessary and invalid. Fix the bug by avoiding the use of ODB transactions in `index_fd()` when callers are only interested in computing the object hash. Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com> Signed-off-by: Justin Tobler <jltobler@gmail.com> [jc: adjusted to `fd13909e` (Merge branch 'jt/odb-transaction', 2025-10-02)] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-04-07 17:32:36 -07:00
Junio C Hamano	7b6d0cd51b	Merge branch 'ps/fsck-wo-the-repository' Internals of "git fsck" have been refactored to not depend on the global `the_repository` variable. * ps/fsck-wo-the-repository: builtin/fsck: stop using `the_repository` in error reporting builtin/fsck: stop using `the_repository` when marking objects builtin/fsck: stop using `the_repository` when checking packed objects builtin/fsck: stop using `the_repository` with loose objects builtin/fsck: stop using `the_repository` when checking reflogs builtin/fsck: stop using `the_repository` when checking refs builtin/fsck: stop using `the_repository` when snapshotting refs builtin/fsck: fix trivial dependence on `the_repository` fsck: drop USE_THE_REPOSITORY fsck: store repository in fsck options fsck: initialize fsck options via a function fetch-pack: move fsck options into function scope	2026-04-07 14:59:26 -07:00
Patrick Steinhardt	c63911b052	odb: rename `odb_has_object()` flags Rename `odb_has_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:14 -07:00
Patrick Steinhardt	b2d421ece6	odb: use enum for `odb_write_object` flags We've got a couple of functions that accept `odb_write_object()` flags, but all of them accept the flags as an `unsigned` integer. In fact, we don't even have an `enum` for the flags field. Introduce this `enum` and adapt functions accordingly according to our coding style. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:13 -07:00
Patrick Steinhardt	ff2e9d85d6	odb: rename `odb_write_object()` flags Rename `odb_write_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:13 -07:00
Junio C Hamano	8e2964dc89	Merge branch 'ps/object-counting' The logic to count objects has been cleaned up. * ps/object-counting: odb: introduce generic object counting odb/source: introduce generic object counting object-file: generalize counting objects object-file: extract logic to approximate object count packfile: extract logic to count number of objects odb: stop including "odb/source.h"	2026-03-25 12:58:05 -07:00
Patrick Steinhardt	3749853908	fsck: store repository in fsck options The fsck subsystem relies on `the_repository` quite a bit. While we could of course explicitly pass a repository down the callchain, we already have a `struct fsck_options` that we pass to almost all functions. Extend the options to also store the repository to make it readily available. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-23 08:33:10 -07:00
Patrick Steinhardt	f223609026	fsck: initialize fsck options via a function We initialize the `struct fsck_options` via a set of macros, often in global scope. In the next commit though we're about to introduce a new repository field to the options that must be initialized, and naturally we don't have a repo other than `the_repository` available in this scope. Refactor the code to instead intrdouce a new `fsck_options_init()` function that initializes the options for us and move initialization into function scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-23 08:33:10 -07:00
Patrick Steinhardt	ab3ab1038d	object-name: move logic to compute loose abbreviation length The function `repo_find_unique_abbrev_r()` takes as input an object ID as well as a minimum object ID length and returns the minimum required prefix to make the object ID unique. The logic that computes the abbreviation length for loose objects is deeply tied to the loose object storage format. As such, it would fail in case a different object storage format was used. Prepare for making this logic generic to the backend by moving the logic into a new `odb_source_loose_find_abbrev_len()` function that is part of "object-file.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	284b7862be	object-name: move logic to iterate through loose prefixed objects The logic to iterate through loose objects that have a certain prefix is currently hosted in "object-name.c". This logic reaches into specifics of the loose object source, so it breaks once a different backend is used for the object storage. Move the logic to iterate through loose objects with a prefix into "object-file.c". This is done by extending the for-each-object options to support an optional prefix that is then honored by the loose source. Naturally, we'll also have this support in the packfile store. This is done in the next commit. Furthermore, there are no users of the loose cache outside of "object-file.c" anymore. As such, convert `odb_source_loose_cache()` to have file scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	cfd575f0a9	odb: introduce `struct odb_for_each_object_options` The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:41 -07:00
Junio C Hamano	7f75767554	Merge branch 'ps/object-counting' into ps/odb-generic-object-name-handling * ps/object-counting: object-file: fix sparse 'plain integer as NULL pointer' error odb: introduce generic object counting odb/source: introduce generic object counting object-file: generalize counting objects object-file: extract logic to approximate object count packfile: extract logic to count number of objects odb: stop including "odb/source.h"	2026-03-20 13:16:09 -07:00
Ramsay Jones	736cef847c	object-file: fix sparse 'plain integer as NULL pointer' error Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-19 18:35:49 -07:00
Junio C Hamano	2eec0f5115	Merge branch 'jk/unleak-mmap' Plug a few leaks where mmap'ed memory regions are not unmapped. * jk/unleak-mmap: meson: turn on NO_MMAP when building with LSan Makefile: turn on NO_MMAP when building with LSan object-file: fix mmap() leak in odb_source_loose_read_object_stream() pack-revindex: avoid double-loading .rev files check_connected(): fix leak of pack-index mmap check_connected(): delay opening new_pack	2026-03-16 10:48:15 -07:00
Junio C Hamano	c89a495ce4	Merge branch 'ps/odb-sources' The object source API is getting restructured to allow plugging new backends. * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header	2026-03-12 14:09:07 -07:00
Patrick Steinhardt	2b24db1110	object-file: generalize counting objects Generalize the function introduced in the preceding commit to not only be able to approximate the number of loose objects, but to also provide an accurate count. The behaviour can be toggled via a new flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:38:42 -07:00
Patrick Steinhardt	222fddeaa4	object-file: extract logic to approximate object count In "builtin/gc.c" we have some logic that checks whether we need to repack objects. This is done by counting the number of objects that we have and checking whether it exceeds a certain threshold. We don't really need an accurate object count though, which is why we only open a single object directory shard and then extrapolate from there. Extract this logic into a new function that is owned by the loose object database source. This is done to prepare for a subsequent change, where we'll introduce object counting on the object database source level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:38:42 -07:00
Junio C Hamano	6cdef943d2	Merge branch 'ps/odb-sources' into ps/object-counting * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header	2026-03-10 10:13:40 -07:00
Jeff King	b68e875bec	object-file: fix mmap() leak in odb_source_loose_read_object_stream() We mmap() a loose object file, storing the result in the local variable "mapped", which is eventually assigned into our stream struct as "st.mapped". If we hit an error, we jump to an error label which does: munmap(st.mapped, st.mapsize); to clean up. But this is wrong; we don't assign st.mapped until the end of the function, after all of the "goto error" jumps. So this munmap() is never cleaning up anything (st.mapped is always NULL, because we initialize the struct with calloc). Instead, we should feed the local variable to munmap(). This leak is due to `595296e124` (streaming: allocate stream inside the backend-specific logic, 2025-11-23), which introduced the local variable. Before that, we assigned the mmap result directly into st.mapped. It was probably switched there so that we do not have to allocate/free the struct when the map operation fails (e.g., because we don't have the loose object). Before that commit, the struct was passed in from the caller, so there was no allocation at all. You can see the leak in the test suite by building with: make SANITIZE=leak NO_MMAP=1 CC=clang and running t1060. We need NO_MMAP so that the mmap() is backed by an actual malloc(), which allows LSan to detect it. And the leak seems not to be detected when compiling with gcc, probably due to some internal compiler decisions about how the stack memory is written. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-06 21:11:32 -08:00
Patrick Steinhardt	5946a564cd	odb/source: make `read_object_info()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Note that this function is a bit less straight-forward to convert compared to the other functions. The reason here is that the logic to read an object is: 1. We try to read the object. If it exists we return it. 2. If the object does not exist we reprepare the object database source. 3. We then try reading the object info a second time in case the reprepare caused it to appear. The second read is only supposed to happen for the packfile store though, as reading loose objects is not impacted by repreparing the object database. Ideally, we'd just move this whole logic into the ODB source. But that's not easily possible because we try to avoid the reprepare unless really required, which is after we have found out that no other ODB source contains the object, either. So the logic spans across multiple ODB sources, and consequently we cannot move it into an individual source. Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the backend that we already tried to look up the object once, and that this time around the ODB source should try to find any new objects that may have surfaced due to an on-disk change. With this flag, the "files" backend can trivially skip trying to re-read the object as a loose object. Furthermore, as we know that we only try the second read via the packfile store, we can skip repreparing loose objects and only reprepare the packfile store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:15 -08:00

1 2 3 4 5 ...

338 Commits