git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-04-10 08:07:43 -05:00

Author	SHA1	Message	Date
Junio C Hamano	fe4ab2e698	Merge branch 'jt/index-fd-wo-repo-regression-fix-maint' During Git 2.52 timeframe, we broke streaming computation of object hash outside a repository, which has been corrected. * jt/index-fd-wo-repo-regression-fix-maint: object-file: avoid ODB transaction when not writing objects	2026-04-08 10:20:51 -07:00
Junio C Hamano	9797fed6ce	Merge branch 'ps/odb-cleanup' Various code clean-up around odb subsystem. * ps/odb-cleanup: odb: drop unneeded headers and forward decls odb: rename `odb_has_object()` flags odb: use enum for `odb_write_object` flags odb: rename `odb_write_object()` flags treewide: use enum for `odb_for_each_object()` flags CodingGuidelines: document our style for flags	2026-04-08 10:19:17 -07:00
Justin Tobler	7d8727ff0b	object-file: avoid ODB transaction when not writing objects In `ce1661f9da` (odb: add transaction interface, 2025-09-16), existing ODB transaction logic is adapted to create a transaction interface at the ODB layer. The intent here is for the ODB transaction interface to eventually provide an object source agnostic means to manage transactions. An unintended consequence of this change though is that `object-file.c:index_fd()` may enter the ODB transaction path even when no object write is requested. In non-repository contexts, this can result in a NULL dereference and segfault. One such case occurs when running git-diff(1) outside of a repository with "core.bigFileThreshold" forcing the streaming path in `index_fd()`: $ echo foo >foo $ echo bar >bar $ git -c core.bigFileThreshold=1 diff -- foo bar In this scenario, the caller only needs to compute the object ID. Object hashing does not require an ODB, so starting a transaction is both unnecessary and invalid. Fix the bug by avoiding the use of ODB transactions in `index_fd()` when callers are only interested in computing the object hash. Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com> Signed-off-by: Justin Tobler <jltobler@gmail.com> [jc: adjusted to `fd13909e` (Merge branch 'jt/odb-transaction', 2025-10-02)] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-04-07 17:32:36 -07:00
Junio C Hamano	7b6d0cd51b	Merge branch 'ps/fsck-wo-the-repository' Internals of "git fsck" have been refactored to not depend on the global `the_repository` variable. * ps/fsck-wo-the-repository: builtin/fsck: stop using `the_repository` in error reporting builtin/fsck: stop using `the_repository` when marking objects builtin/fsck: stop using `the_repository` when checking packed objects builtin/fsck: stop using `the_repository` with loose objects builtin/fsck: stop using `the_repository` when checking reflogs builtin/fsck: stop using `the_repository` when checking refs builtin/fsck: stop using `the_repository` when snapshotting refs builtin/fsck: fix trivial dependence on `the_repository` fsck: drop USE_THE_REPOSITORY fsck: store repository in fsck options fsck: initialize fsck options via a function fetch-pack: move fsck options into function scope	2026-04-07 14:59:26 -07:00
Patrick Steinhardt	c63911b052	odb: rename `odb_has_object()` flags Rename `odb_has_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:14 -07:00
Patrick Steinhardt	b2d421ece6	odb: use enum for `odb_write_object` flags We've got a couple of functions that accept `odb_write_object()` flags, but all of them accept the flags as an `unsigned` integer. In fact, we don't even have an `enum` for the flags field. Introduce this `enum` and adapt functions accordingly according to our coding style. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:13 -07:00
Patrick Steinhardt	ff2e9d85d6	odb: rename `odb_write_object()` flags Rename `odb_write_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:13 -07:00
Patrick Steinhardt	3749853908	fsck: store repository in fsck options The fsck subsystem relies on `the_repository` quite a bit. While we could of course explicitly pass a repository down the callchain, we already have a `struct fsck_options` that we pass to almost all functions. Extend the options to also store the repository to make it readily available. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-23 08:33:10 -07:00
Patrick Steinhardt	f223609026	fsck: initialize fsck options via a function We initialize the `struct fsck_options` via a set of macros, often in global scope. In the next commit though we're about to introduce a new repository field to the options that must be initialized, and naturally we don't have a repo other than `the_repository` available in this scope. Refactor the code to instead intrdouce a new `fsck_options_init()` function that initializes the options for us and move initialization into function scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-23 08:33:10 -07:00
Patrick Steinhardt	ab3ab1038d	object-name: move logic to compute loose abbreviation length The function `repo_find_unique_abbrev_r()` takes as input an object ID as well as a minimum object ID length and returns the minimum required prefix to make the object ID unique. The logic that computes the abbreviation length for loose objects is deeply tied to the loose object storage format. As such, it would fail in case a different object storage format was used. Prepare for making this logic generic to the backend by moving the logic into a new `odb_source_loose_find_abbrev_len()` function that is part of "object-file.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	284b7862be	object-name: move logic to iterate through loose prefixed objects The logic to iterate through loose objects that have a certain prefix is currently hosted in "object-name.c". This logic reaches into specifics of the loose object source, so it breaks once a different backend is used for the object storage. Move the logic to iterate through loose objects with a prefix into "object-file.c". This is done by extending the for-each-object options to support an optional prefix that is then honored by the loose source. Naturally, we'll also have this support in the packfile store. This is done in the next commit. Furthermore, there are no users of the loose cache outside of "object-file.c" anymore. As such, convert `odb_source_loose_cache()` to have file scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	cfd575f0a9	odb: introduce `struct odb_for_each_object_options` The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:41 -07:00
Junio C Hamano	7f75767554	Merge branch 'ps/object-counting' into ps/odb-generic-object-name-handling * ps/object-counting: object-file: fix sparse 'plain integer as NULL pointer' error odb: introduce generic object counting odb/source: introduce generic object counting object-file: generalize counting objects object-file: extract logic to approximate object count packfile: extract logic to count number of objects odb: stop including "odb/source.h"	2026-03-20 13:16:09 -07:00
Ramsay Jones	736cef847c	object-file: fix sparse 'plain integer as NULL pointer' error Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-19 18:35:49 -07:00
Junio C Hamano	2eec0f5115	Merge branch 'jk/unleak-mmap' Plug a few leaks where mmap'ed memory regions are not unmapped. * jk/unleak-mmap: meson: turn on NO_MMAP when building with LSan Makefile: turn on NO_MMAP when building with LSan object-file: fix mmap() leak in odb_source_loose_read_object_stream() pack-revindex: avoid double-loading .rev files check_connected(): fix leak of pack-index mmap check_connected(): delay opening new_pack	2026-03-16 10:48:15 -07:00
Junio C Hamano	c89a495ce4	Merge branch 'ps/odb-sources' The object source API is getting restructured to allow plugging new backends. * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header	2026-03-12 14:09:07 -07:00
Patrick Steinhardt	2b24db1110	object-file: generalize counting objects Generalize the function introduced in the preceding commit to not only be able to approximate the number of loose objects, but to also provide an accurate count. The behaviour can be toggled via a new flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:38:42 -07:00
Patrick Steinhardt	222fddeaa4	object-file: extract logic to approximate object count In "builtin/gc.c" we have some logic that checks whether we need to repack objects. This is done by counting the number of objects that we have and checking whether it exceeds a certain threshold. We don't really need an accurate object count though, which is why we only open a single object directory shard and then extrapolate from there. Extract this logic into a new function that is owned by the loose object database source. This is done to prepare for a subsequent change, where we'll introduce object counting on the object database source level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:38:42 -07:00
Junio C Hamano	6cdef943d2	Merge branch 'ps/odb-sources' into ps/object-counting * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header	2026-03-10 10:13:40 -07:00
Jeff King	b68e875bec	object-file: fix mmap() leak in odb_source_loose_read_object_stream() We mmap() a loose object file, storing the result in the local variable "mapped", which is eventually assigned into our stream struct as "st.mapped". If we hit an error, we jump to an error label which does: munmap(st.mapped, st.mapsize); to clean up. But this is wrong; we don't assign st.mapped until the end of the function, after all of the "goto error" jumps. So this munmap() is never cleaning up anything (st.mapped is always NULL, because we initialize the struct with calloc). Instead, we should feed the local variable to munmap(). This leak is due to `595296e124` (streaming: allocate stream inside the backend-specific logic, 2025-11-23), which introduced the local variable. Before that, we assigned the mmap result directly into st.mapped. It was probably switched there so that we do not have to allocate/free the struct when the map operation fails (e.g., because we don't have the loose object). Before that commit, the struct was passed in from the caller, so there was no allocation at all. You can see the leak in the test suite by building with: make SANITIZE=leak NO_MMAP=1 CC=clang and running t1060. We need NO_MMAP so that the mmap() is backed by an actual malloc(), which allows LSan to detect it. And the leak seems not to be detected when compiling with gcc, probably due to some internal compiler decisions about how the stack memory is written. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-06 21:11:32 -08:00
Patrick Steinhardt	5946a564cd	odb/source: make `read_object_info()` function pluggable Introduce a new callback function in `struct odb_source` to make the function pluggable. Note that this function is a bit less straight-forward to convert compared to the other functions. The reason here is that the logic to read an object is: 1. We try to read the object. If it exists we return it. 2. If the object does not exist we reprepare the object database source. 3. We then try reading the object info a second time in case the reprepare caused it to appear. The second read is only supposed to happen for the packfile store though, as reading loose objects is not impacted by repreparing the object database. Ideally, we'd just move this whole logic into the ODB source. But that's not easily possible because we try to avoid the reprepare unless really required, which is after we have found out that no other ODB source contains the object, either. So the logic spans across multiple ODB sources, and consequently we cannot move it into an individual source. Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the backend that we already tried to look up the object once, and that this time around the ODB source should try to find any new objects that may have surfaced due to an on-disk change. With this flag, the "files" backend can trivially skip trying to re-read the object as a loose object. Furthermore, as we know that we only try the second read via the packfile store, we can skip repreparing loose objects and only reprepare the packfile store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:15 -08:00
Patrick Steinhardt	d9ecf268ef	odb: embed base source in the "files" backend The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:15 -08:00
Patrick Steinhardt	cb506a8a69	odb: introduce "files" source Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-05 11:45:14 -08:00
Junio C Hamano	d93be9cbca	Merge branch 'ps/fsck-stream-from-the-right-object-instance' "fsck" iterates over packfiles and its access to pack data caused the list to be permuted, which caused it to loop forever; the code to access pack data by "fsck" has been updated to avoid this. * ps/fsck-stream-from-the-right-object-instance: pack-check: fix verification of large objects packfile: expose function to read object stream for an offset object-file: adapt `stream_object_signature()` to take a stream t/helper: improve "genrandom" test helper	2026-03-05 10:04:49 -08:00
Junio C Hamano	9db3644607	Merge branch 'jt/object-file-use-container-of' Code clean-up. * jt/object-file-use-container-of: object-file.c: avoid container_of() of a NULL container object-file: use `container_of()` to convert from base types	2026-03-02 17:06:53 -08:00
Junio C Hamano	10dd04a3fe	Merge branch 'ps/object-info-bits-cleanup' A couple of bugs in use of flag bits around odb API has been corrected, and the flag bits reordered. * ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`	2026-03-02 17:06:52 -08:00
Junio C Hamano	9eb5b3b999	Merge branch 'ps/odb-for-each-object' Revamp object enumeration API around odb. * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags	2026-03-02 17:06:50 -08:00
Junio C Hamano	b1af291b4a	Merge branch 'ps/object-info-bits-cleanup' into ps/odb-sources * ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`	2026-02-23 13:48:48 -08:00
Junio C Hamano	703c97519d	Merge branch 'ps/odb-for-each-object' into ps/odb-sources * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags	2026-02-23 13:48:00 -08:00
Patrick Steinhardt	10a6762719	object-file: adapt `stream_object_signature()` to take a stream The function `stream_object_signature()` is responsible for verifying whether the given object ID matches the actual hash of the object's contents. In contrast to `check_object_signature()` it does so in a streaming fashion so that we don't have to load the full object into memory. In a subsequent commit we'll want to adapt one of its callsites to pass a preconstructed stream. Prepare for this by accepting a stream as input that the caller needs to assemble. While at it, improve the error reporting in `parse_object_with_flags()` to tell apart the two failure modes. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:19:00 -08:00
Junio C Hamano	8e06f961d7	object-file.c: avoid container_of() of a NULL container Even though the "struct odb_transaction" member is at the beginning of the containing "struct odb_transaction_files", i.e., at offset 0, using container_of() to add offset 0 to a NULL pointer gets flagged as a bad behaviour under SANITIZE=undefined. Use container_of_or_null() to work around this issue. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-22 12:19:14 -08:00
Justin Tobler	20daad2db4	object-file: use `container_of()` to convert from base types To improve code hygiene, replace direct casts from `struct odb_transaction` and `struct odb_read_stream` to their concrete implementations with `container_of()`. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-19 10:12:11 -08:00
Patrick Steinhardt	f6516a5241	odb: convert object info flags into an enum Convert the object info flags into an enum and adapt all functions that receive these flags as parameters to use the enum instead of an integer. This serves two purposes: - The function signatures become more self-documenting, as callers don't have to wonder which flags they expect. - The compiler can warn when a wrong flag type is passed. Note that the second benefit is somewhat limited. For example, when or-ing multiple enum flags together the result will be an integer, and the compiler will not warn about such use cases. But where it does help is when a single flag of the wrong type is passed, as the compiler would generate a warning in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-12 11:05:08 -08:00
Justin Tobler	3f67e3d021	odb: transparently handle common transaction behavior A new ODB transaction is created and returned via `odb_transaction_begin()` and stored in the ODB. Only a single transaction may be pending at a time. If the ODB already has a transaction, the function is expected to return NULL. Similarly, when committing a transaction via `odb_transaction_commit()` the transaction being committed must match the pending transaction and upon commit reset the ODB transaction to NULL. These behaviors apply regardless of the ODB transaction implementation. Move the corresponding logic into `odb_transaction_{begin,commit}()` accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Justin Tobler	fa7d067923	odb: prepare `struct odb_transaction` to become generic An ODB transaction handles how objects are stored temporarily and eventually committed. Due to object storage being implemented differently for a given ODB source, the ODB transactions must be implemented in a manner specific to the source the objects are being written to. To provide generic transactions, `struct odb_transaction` is updated to store a commit callback that can be configured to support a specific ODB source. For now `struct odb_transaction_files` is the only transaction type and what is always returned when starting a transaction. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Justin Tobler	8bf06d05a5	object-file: rename transaction functions In a subsequent commit, ODB transactions are made more generic to facilitate each ODB source providing its own transaction handling. Rename `object_file_transaction_{begin,commit}()` to `odb_transaction_files_{begin,commit}()` to better match the future source specific transaction implementation. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Justin Tobler	585e8dfa27	odb: store ODB source in `struct odb_transaction` Each `struct odb_transaction` currently stores a reference to the `struct object_database`. Since transactions are handled per object source, instead store a reference to the source. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-02 17:14:03 -08:00
Patrick Steinhardt	3565faf28c	odb: drop unused `for_each_{loose,packed}_object()` functions We have converted all callers of `for_each_loose_object()` and `for_each_packed_object()` to use their new replacement functions instead. We can thus remove them now. Do so and inline `packfile_store_for_each_object_internal()` now that it only has a single callsite again. This makes it a bit easier to follow the callback indirection that is happening there. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:08 -08:00
Patrick Steinhardt	7b7cbaef27	odb: introduce mtime fields for object info requests There are some use cases where we need to figure out the mtime for objects. Most importantly, this is the case when we want to prune unreachable objects. But getting at that data requires users to manually derive the info either via the loose object's mtime, the packfiles' mtime or via the ".mtimes" file. Introduce a new `struct object_info::mtimep` pointer that allows callers to request an object's mtime. This new field will be used in a subsequent commit. Note that the concept of "mtime" is ambiguous: given an object, it may be stored multiple times in the object database, and each of these instances may have a different mtime. Disambiguating these mtimes is nothing that can happen on the generic ODB layer: the caller may search for the oldest object, the newest object, or even the relation of object mtimes depending on the specific source they are located in. As such, it is the responsibility of the caller to disambiguate mtimes. A consequence of this is that it's most likely incorrect to look up the mtime via `odb_read_object_info()`, as this interface does not give us enough information to disambiguate the mtime. Document this accordingly and tell users to use `odb_for_each_object()` instead. Even with this gotcha though it's sensible to have this request as part of the object info, as the mtime is a property of the object storage format. If we for example had a "black-box" storage backend, we'd still need to be able to query it for the mtime info in a generic way. We could introduce a safety mechanism that for example calls `BUG()` in case we look up the mtime outside of `odb_for_each_object()`. But that feels somewhat heavy-handed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:08 -08:00
Patrick Steinhardt	cde615b6f0	object-file: introduce function to iterate through objects We have multiple divergent interfaces to iterate through objects of a specific backend: - `for_each_loose_object()` yields all loose objects. - `for_each_packed_object()` (somewhat obviously) yields all packed objects. These functions have different function signatures, which makes it hard to create a common abstraction layer that covers both of these. Introduce a new function `odb_source_loose_for_each_object()` to plug this gap. This function doesn't take any data specific to loose objects, but instead it accepts a `struct object_info` that will be populated the exact same as if `odb_source_loose_read_object()` was called. The benefit of this new interface is that we can continue to pass backend-specific data, as `struct object_info` contains a union for these exact use cases. This will allow us to unify how we iterate through objects across both loose and packed objects in a subsequent commit. The `for_each_loose_object()` function continues to exist for now, but it will be removed at the end of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:07 -08:00
Patrick Steinhardt	6ecab3cdf6	object-file: extract function to read object info from path Extract a new function that allows us to read object info for a specific loose object via a user-supplied path. This function will be used in a subsequent commit. Note that this also allows us to drop `stat_loose_object()`, which is a simple wrapper around `odb_loose_path()` plus lstat(3p). Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:06 -08:00
Patrick Steinhardt	6358da200f	odb: fix flags parameter to be unsigned The `flags` parameter accepted by various `for_each_object()` functions is a bitfield of multiple flags. Such parameters are typically unsigned in the Git codebase, but we use `enum odb_for_each_object_flags` in some places. Adapt these function signatures to use the correct type. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:06 -08:00
Patrick Steinhardt	bd1855b897	odb: rename `FOR_EACH_OBJECT_` flags Rename the `FOR_EACH_OBJECT_` flags to have an `ODB_` prefix. This prepares us for a new upcoming `odb_for_each_object()` function and ensures that both the function and its flags have the same prefix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-26 08:26:06 -08:00
Patrick Steinhardt	123e8186a7	object-file: always set OI_LOOSE when reading object info There are some early returns in `odb_source_loose_read_object_info()` in cases where we don't have to open the loose object. These return paths do not set `struct object_info::whence` to `OI_LOOSE` though, so it becomes impossible for the caller to tell the format of such an object. The root cause of this really is that we have so many different return paths in the function. As a consequence, it's harder than necessary to make sure that all successful exit paths sot up the `whence` field as expected. Address this by refactoring the function to have a single exit path. Like this, we can trivially set up the `whence` field when we exit successfully from the function. Note that we also: - Rename `status` to `ret` to match our usual coding style, but also to show that the old `status` variable is now always getting the expected value. Furthermore, the value is not initialized anymore, which has the consequence that most compilers will warn for exit paths where we forgot to set it. - Move the setup of scratch pointers closer to `parse_loose_header()` to show where it's needed. - Guard a couple of variables on cleanup so that they only get released in case they have been set up. - Reset `oi->delta_base_oid` towards the end of the function, together with all the other object info pointers. Overall, all these changes result in a diff that is somewhat hard to read. But the end result is significantly easier to read and reason about, so I'd argue this one-time churn is worth it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-12 06:51:14 -08:00
Junio C Hamano	00f117fafe	Merge branch 'jc/object-read-stream-fix' into ps/read-object-info-improvements * jc/object-read-stream-fix: odb: do not use "blank" substitute for NULL	2025-12-18 20:55:09 +09:00
Junio C Hamano	a650ad996d	odb: do not use "blank" substitute for NULL When various *object_info() functions are given an extended object info structure as NULL by a caller that does not want any details, the code uses a file-scope static blank_oi and passes it down to the helper functions they use, to avoid handling NULL specifically. The ps/object-read-stream topic graduated to 'master' recently however had a bug that assumed that two identically named file-scope static variables in two functions are the same, which of course is not the case. This made "git commit" take 0.38 seconds to 1508 seconds in some case, as reported by Aaron Plattner here: https://lore.kernel.org/git/f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com/ We _could_ move the blank_oi variable to the global scope in common section to fix this regression, but explicitly handling the NULL is a much safer fix. It would also reduce the chance of errors that somebody accidentally writes into blank_oi, making its contents dirty, which potentially will make subsequent calls into the function misbehave. By explicitly handling NULL input, we no longer have to worry about it. Reported-by: Aaron Plattner <aplattner@nvidia.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-18 20:48:01 +09:00
Junio C Hamano	dbe54273a7	Merge branch 'ps/object-read-stream' The "git_istream" abstraction has been revamped to make it easier to interface with pluggable object database design. * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`	2025-12-16 11:08:34 +09:00
Junio C Hamano	a545103244	Merge branch 'ps/object-source-loose' A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable	2025-11-24 15:46:41 -08:00
Junio C Hamano	d91d79f26d	Merge branch 'bc/submodule-force-same-hash' Adding a repository that uses a different hash function is a no-no, but "git submodule add" did nt prevent it, which has been corrected. * bc/submodule-force-same-hash: read-cache: drop submodule check from add_to_cache() object-file: disallow adding submodules of different hash algo	2025-11-24 15:46:40 -08:00
Patrick Steinhardt	7b94028652	streaming: drop redundant type and size pointers In the preceding commits we have turned `struct odb_read_stream` into a publicly visible structure. Furthermore, this structure now contains the type and size of the object that we are about to stream. Consequently, the out-pointers that we used before to propagate the type and size of the streamed object are now somewhat redundant with the data contained in the structure itself. Drop these out-pointers and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-23 12:56:46 -08:00

1 2 3 4 5 ...

309 Commits