git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-06-14 15:01:15 -05:00

Author	SHA1	Message	Date
Johannes Schindelin	ae90a3ee5c	Merge 'objects-larger-than-4gb-on-windows-pt2' This is hidden in v2.55.0-rc0's own CI because of an omission in `5ba82911bc` (ci: enable EXPENSIVE for contributor builds, 2026-05-11) which fails to enable EXPENSIVE tests for tags. Due to `7d78d5fc1a` (ci: skip GitHub workflow runs for already-tested commits/trees, 2020-10-08), the CI of `master` is now also mistakenly green because it reuses the tag's CI run to prove that it's solid. This is an evil merge by necessity because `survey.c` needs to adapt to the changed function signatures. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-13 03:11:27 +00:00
Philip Oakley	0934e6fec5	hash algorithms: use size_t for section lengths Continue walking the code path for the >4GB `hash-object --literally` test to the hash algorithm step for LLP64 systems. This patch lets the SHA1DC code use `size_t`, making it compatible with LLP64 data models (as used e.g. by Windows). The interested reader of this patch will note that we adjust the signature of the `git_SHA1DCUpdate()` function without updating _any_ call site. This certainly puzzled at least one reviewer already, so here is an explanation: This function is never called directly, but always via the macro `platform_SHA1_Update`, which is usually called via the macro `git_SHA1_Update`. However, we never call `git_SHA1_Update()` directly in `struct git_hash_algo`. Instead, we call `git_hash_sha1_update()`, which is defined thusly: static void git_hash_sha1_update(git_hash_ctx ctx, const void data, size_t len) { git_SHA1_Update(&ctx->sha1, data, len); } i.e. it contains an implicit downcast from `size_t` to `unsigned long` (before this here patch). With this patch, there is no downcast anymore. With this patch, finally, the t1007-hash-object.sh "files over 4GB hash literally" test case is fixed. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-13 02:58:28 +00:00
Philip Oakley	cf20a42fd8	object-file.c: use size_t for header lengths Continue walking the code path for the >4GB `hash-object --literally` test. The `hash_object_file_literally()` function internally uses both `hash_object_file()` and `write_object_file_prepare()`. Both function signatures use `unsigned long` rather than `size_t` for the mem buffer sizes. Use `size_t` instead, for LLP64 compatibility. While at it, convert those function's object's header buffer length to `size_t` for consistency. The value is already upcast to `uintmax_t` for print format compatibility. Note: The hash-object test still does not pass. A subsequent commit continues to walk the call tree's lower level hash functions to identify further fixes. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-13 02:58:28 +00:00
Junio C Hamano	7c63b245cd	Merge branch 'ps/cat-file-remote-object-info' into seen The `remote-object-info` command has been added to `git cat-file --batch-command`, allowing clients to request object metadata (currently size) from a remote server via protocol v2 without downloading the entire object. The client dynamically filters format placeholders based on server-advertised capabilities and safely returns empty strings for inapplicable or unsupported fields. * ps/cat-file-remote-object-info: cat-file: make remote-object-info allow-list dynamic cat-file: validate remote atoms with allow_list cat-file: add remote-object-info to batch-command transport: add client support for object-info serve: advertise object-info feature fetch-pack: move fetch initialization connect: refactor packet writing fetch-pack: move function to connect.c t1006: split test utility functions into new "lib-cat-file.sh" cat-file: add declaration of variable i inside its for loop git-compat-util: add strtoul_ul() with error handling transport-helper: fix memory leak of helper on disconnect	2026-06-12 15:58:18 -07:00
Junio C Hamano	a1d02357f1	Merge branch 'ob/more-repo-config-values' into jch Many core configuration variables have been migrated from global variables into 'repo_config_values' to tie them to a specific repository instance, avoiding cross-repository state leakage. * ob/more-repo-config-values: environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values` environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values` environment: move "core_sparse_checkout_cone" into `struct repo_config_values` environment: move "precomposed_unicode" into `struct repo_config_values` environment: move "pack_compression_level" into `struct repo_config_values` environment: move `zlib_compression_level` into `struct repo_config_values` environment: move "check_stat" into `struct repo_config_values` environment: move "trust_ctime" into `struct repo_config_values`	2026-06-12 15:57:14 -07:00
Eric Ju	82b4ab8001	cat-file: add remote-object-info to batch-command Since the `info` command in `cat-file --batch-command` prints object info for a given object, it is natural to add another command in `cat-file --batch-command` to print object info for a given object from a remote. Add `remote-object-info` to `cat-file --batch-command`. While `info` takes object ids one at a time, this creates overhead when making requests to a server. So `remote-object-info` instead can take multiple object ids at once. The `cat-file --batch-command` command is generally implemented in the following manner: - Receive and parse input from user - Call respective function attached to command - Get object info, print object info In --buffer mode, this changes to: - Receive and parse input from user - Store respective function attached to command in a queue - After flush, loop through commands in queue - Call respective function attached to command - Get object info, print object info Notice how the getting and printing of object info is accomplished one at a time. As described above, this creates a problem for making requests to a server. Therefore, `remote-object-info` is implemented in the following manner: - Receive and parse input from user If command is `remote-object-info`: - Get object info from remote - Loop through and print each object info Else: - Call respective function attached to command - Parse input, get object info, print object info And finally for --buffer mode `remote-object-info`: - Receive and parse input from user - Store respective function attached to command in a queue - After flush, loop through commands in queue: If command is `remote-object-info`: - Get object info from remote - Loop through and print each object info Else: - Call respective function attached to command - Get object info, print object info To summarize, `remote-object-info` gets object info from the remote and then loops through the object info passed in, printing the info. In order for `remote-object-info` to avoid remote communication overhead in the non-buffer mode, the objects are passed in as such: remote-object-info <remote> <oid> <oid> ... <oid> rather than remote-object-info <remote> <oid> remote-object-info <remote> <oid> ... remote-object-info <remote> <oid> Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Eric Ju <eric.peijian@gmail.com> Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-08 23:30:15 +09:00
Johannes Schindelin	f3aeae983a	odb: use size_t for object_info.sizep and the size APIs When `js/objects-larger-than-4gb-on-windows` widened the streaming, index-pack and unpack-objects code paths, in the interest of keeping the patches somewhat reasonably-sized, it left the public ODB API still typed in `unsigned long`. In particular `struct object_info::sizep` and the four wrappers built on top of it (`odb_read_object`, `odb_read_object_peeled`, `odb_read_object_info`, `odb_pretend_object`) still return the unpacked size through `unsigned long *`, so on Windows `cat-file -s` and the `git add` / `git status` paths for a >4 GiB blob silently cap at 4 GiB. Widen the field and the four wrappers. The previous commits already widened the `unpack_entry()` cascade and pack-objects' in-core size accessors, so most of the cascade arrives here with no further work: the temporary shims in `packed_object_info_with_index_pos()` and in `unpack_entry()`'s delta-base recovery path go away, the two `SET_SIZE(entry, cast_size_t_to_ulong(canonical_size))` calls in `check_object()` and the matching one in `drop_reused_delta()` collapse to plain `SET_SIZE`, and `oe_get_size_slow()`'s tail `cast_size_t_to_ulong()` is gone too. What remains narrow are the boundaries this series does not intend to touch: the diff, blame, textconv and fast-import machinery. Even so, this patch is unfortunately quite large. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-04 09:51:22 +02:00
Olamide Caleb Bello	8cd7402acc	environment: move "pack_compression_level" into `struct repo_config_values` The `pack_compression_level` configuration is currently stored in the global variable `pack_compression_level`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `pack_compression_level` is parsed eagerly because it influences packfile compression, a core operation where a lazy parse could cause inconsistent behavior and hamper libification. This preserves the existing eager‑parsing behavior while tying the value to the repository from which it was read, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-03 08:36:48 +09:00
Olamide Caleb Bello	e0f86540ab	environment: move `zlib_compression_level` into `struct repo_config_values` The `zlib_compression_level` configuration is currently stored in the global variable `zlib_compression_level`, which makes it shared across repository instances within a single process. Store it instead in `repo_config_values`, where eagerly‑parsed repository configuration lives. `zlib_compression_level` is parsed eagerly because it determines compression behaviour for objects and packs – core operations where a lazy parse could lead to unpredictable results and hinder libification. This preserves the existing eager‑parsing behavior while tying the value to the repository it was read from, avoiding cross‑repository state leakage and continuing the effort to reduce reliance on global configuration state. Update all references to use `repo_config_values()`. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-03 08:36:48 +09:00
Patrick Steinhardt	b9906a645c	object-file: refactor writing objects to use loose source The "object-file" subsystem still hosts the majority of logic used to write loose objects. Eventually, we'll want to move this logic into "odb/source-loose.c", but this isn't yet easily possible because a lot of the writing logic is still being shared with `force_object_loose()`. We will eventually detangle this logic so that we can indeed move all of it into the "loose" source. Meanwhile though, refactor the code so that it operates on a `struct odb_source_loose` directly to already make the dependency explicit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	04a6e84cbd	odb/source-loose: wire up `write_object()` callback Move `odb_source_loose_write_object()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `write_object()` callback of the loose source. As in preceding commits, this requires us to expose a couple of generic functions from "object-file.c" as they are used in both subsystems now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	87588db131	loose: refactor object map to operate on `struct odb_source_loose` While the loose object map functions in "loose.c" accept a generic `struct odb_source *`, they always expect this to be the "files" backend. Furthermore, the subsystem doesn't even care about the "files" backend, but only uses it as a stepping stone to get to the "loose" backend. This assumption is implicit and thus not immediately obvious. Refactor the interfaces to instead operate on a `struct odb_source_loose` instead, which eliminates the implicit dependency and unnecessary detour via the "files" source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	d8b9e8bb23	odb/source-loose: wire up `freshen_object()` callback Move `odb_source_loose_freshen_object()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `freshen_object()` callback of the loose source. As part of the move, `check_and_freshen_source()` is inlined into the callback function, as it has no other callers anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	86f7ab5a1f	odb/source-loose: drop `odb_source_loose_has_object()` The function `odb_source_loose_has_object()` checks whether a specific object exists as a loose object on disk by using lstat(3p). This interface is somewhat redundant, as we typically check for object existence in a generic way via `odb_source_read_object_info()`. In fact, these two calls are redundant in case the latter is called in a specific way: when called without an object info request and without the `OBJECT_INFO_QUICK` flag, then we will end up doing the same call to lstat(3p) in `read_object_info_from_path()`. Drop the function and adapt callers to instead use the generic interface so that its calling conventions align with that of other sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	2ade08ac29	odb/source-loose: wire up `count_objects()` callback Move `odb_source_loose_count_objects()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `count_objects()` callback of the loose source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	8a6da81cc1	odb/source-loose: wire up `find_abbrev_len()` callback Move `odb_source_loose_find_abbrev_len()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `find_abbrev_len` callback of the loose source. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	e4f1d9ba57	odb/source-loose: wire up `for_each_object()` callback Move `odb_source_loose_for_each_object()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `for_each_object()` callback of the loose source. Again, as in the preceding commit, we are forced to expose a couple of functions from "object-file.c" that are now used by both subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	727a935a71	odb/source-loose: wire up `read_object_stream()` callback Move `odb_source_loose_read_object_stream()` and its associated helpers from "object-file.c" into "odb/source-loose.c" and wire it up as the `read_object_stream()` callback of the loose source. As part of the move we are also forced to expose a couple of functions from "object-file.h" that parse object headers in a somewhat-generic way, as those functions are now used by both subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:18 +09:00
Patrick Steinhardt	584338ed92	odb/source-loose: wire up `read_object_info()` callback Move `odb_source_loose_read_object_info()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `read_object_info()` callback of the loose source. Callers that previously invoked it directly now go through the generic `odb_source_read_object_info()` interface instead. The function `read_object_info_from_path()` cannot be moved along with it because it is still called by `for_each_object_wrapper_cb()`. It is therefore kept in place, but adjusted to take a loose source to clarify that it's always operating on this structure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	a2b7db9bc8	odb/source-loose: wire up `reprepare()` callback Move `odb_source_loose_reprepare()` from "object-file.c" into "odb/source-loose.c" and wire it up as the `reprepare()` callback of the loose source. While at it, make `odb_source_loose_clear_cache()` static, as it is no longer needed outside of its file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	ead691927b	odb/source-loose: start converting to a proper `struct odb_source` Start converting `struct odb_source_loose` into a proper pluggable `struct odb_source` by embedding the base struct and assigning it the new `ODB_SOURCE_LOOSE` type. Furthermore, wire up lifecycle management of this source by implementing the `free` callback and taking ownership of the chdir notifications. Note that the loose source is not yet functional as a standalone `struct odb_source`, as it's missing all of the callback implementations. These will be wired up in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	1d451ba6fe	odb/source-loose: store pointer to "files" instead of generic source The `struct odb_source_loose` holds a pointer to its owning parent source. The way that Git is currently structured, this parent is always the "files" source. In subsequent commits we're going to detangle that so that the "loose" source doesn't have any owning parent source at all so that it can be used as a completely standalone source. Detangling this mess is somewhat intricate though, and is made even more intricate because it's not always clear which kind of source one is holding at a specific point in time -- either the parent "files" source, or the child "loose" source. Make this relationship more explicit by storing a pointer to the "files" source instead of storing a pointer to a generic `struct odb_source`. This will help make subsequent steps a bit clearer. Note that this is a temporary step, only. At the end of this series we will have dropped the parent pointer completely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Patrick Steinhardt	514f039c90	odb/source-loose: move loose source into "odb/" subsystem In subsequent patches we'll be turning `struct odb_source_loose` into a proper `struct odb_source`. As a first step towards this goal, move its struct out of "object-file.c" and into "odb/source-loose.c". This detaches the implementation of the loose object source from the generic object file code, following the same convention already used by the "files" and "in-memory" sources. No functional changes are intended. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-06-01 18:47:17 +09:00
Junio C Hamano	0839e56957	Merge branch 'ps/odb-in-memory' Add a new odb "in-memory" source that is meant to only hold tentative objects (like the virtual blob object that represents the working tree file used by "git blame"). * ps/odb-in-memory: t/unit-tests: add tests for the in-memory object source odb: generic in-memory source odb/source-inmemory: stub out remaining functions odb/source-inmemory: implement `freshen_object()` callback odb/source-inmemory: implement `count_objects()` callback odb/source-inmemory: implement `find_abbrev_len()` callback odb/source-inmemory: implement `for_each_object()` callback odb/source-inmemory: convert to use oidtree oidtree: add ability to store data cbtree: allow using arbitrary wrapper structures for nodes odb/source-inmemory: implement `write_object_stream()` callback odb/source-inmemory: implement `write_object()` callback odb/source-inmemory: implement `read_object_stream()` callback odb/source-inmemory: implement `read_object_info()` callback odb: fix unnecessary call to `find_cached_object()` odb/source-inmemory: implement `free()` callback odb: introduce "in-memory" source	2026-05-27 14:15:46 +09:00
Junio C Hamano	2f952b81ed	Merge branch 'jt/odb-transaction-write' ODB transaction interface is being reworked to explicitly handle object writes. * jt/odb-transaction-write: odb/transaction: make `write_object_stream()` pluggable object-file: generalize packfile writes to use odb_write_stream object-file: avoid fd seekback by checking object size upfront object-file: remove flags from transaction packfile writes odb: update `struct odb_write_stream` read() callback odb/transaction: use pluggable `begin_transaction()` odb: split `struct odb_transaction` into separate header	2026-05-27 14:15:45 +09:00
Junio C Hamano	9ebc19b760	Merge branch 'ps/odb-in-memory' into ps/odb-source-loose * ps/odb-in-memory: (24 commits) t/unit-tests: add tests for the in-memory object source odb: generic in-memory source odb/source-inmemory: stub out remaining functions odb/source-inmemory: implement `freshen_object()` callback odb/source-inmemory: implement `count_objects()` callback odb/source-inmemory: implement `find_abbrev_len()` callback odb/source-inmemory: implement `for_each_object()` callback odb/source-inmemory: convert to use oidtree oidtree: add ability to store data cbtree: allow using arbitrary wrapper structures for nodes odb/source-inmemory: implement `write_object_stream()` callback odb/source-inmemory: implement `write_object()` callback odb/source-inmemory: implement `read_object_stream()` callback odb/source-inmemory: implement `read_object_info()` callback odb: fix unnecessary call to `find_cached_object()` odb/source-inmemory: implement `free()` callback odb: introduce "in-memory" source odb/transaction: make `write_object_stream()` pluggable object-file: generalize packfile writes to use odb_write_stream object-file: avoid fd seekback by checking object size upfront ...	2026-05-21 22:34:55 +09:00
Patrick Steinhardt	449650decf	oidtree: add ability to store data The oidtree data structure is currently only used to store object IDs, without any associated data. So consequently, it can only really be used to track which object IDs exist, and we can use the tree structure to efficiently operate on OID prefixes. But there are valid use cases where we want to both: - Store object IDs in a sorted order. - Associated arbitrary data with them. Refactor the oidtree interface so that it allows us to store arbitrary payloads within the respective nodes. This will be used in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:50:45 +09:00
Junio C Hamano	2f124686e8	Merge branch 'jt/odb-transaction-write' into ps/odb-in-memory * jt/odb-transaction-write: odb/transaction: make `write_object_stream()` pluggable object-file: generalize packfile writes to use odb_write_stream object-file: avoid fd seekback by checking object size upfront object-file: remove flags from transaction packfile writes odb: update `struct odb_write_stream` read() callback odb/transaction: use pluggable `begin_transaction()` odb: split `struct odb_transaction` into separate header	2026-05-15 04:50:31 +09:00
Justin Tobler	08b6afb2a2	odb/transaction: make `write_object_stream()` pluggable How an ODB transaction handles writing objects is expected to vary between implementations. Introduce a new `write_object_stream()` callback in `struct odb_transaction` to make this function pluggable. Rename `index_blob_packfile_transaction()` to `odb_transaction_files_write_object_stream()` and wire it up for use with `struct odb_transaction_files` accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	45a75d6187	object-file: generalize packfile writes to use odb_write_stream The `index_blob_packfile_transaction()` function streams blob data directly from an fd. This makes it difficult to reuse as part of a generic transactional object writing interface. Refactor the packfile write path to operate on a `struct odb_write_stream`, allowing callers to supply data from arbitrary sources. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	d4c92e2ac9	object-file: avoid fd seekback by checking object size upfront In certain scenarios, Git handles writing blobs that exceed "core.bigFileThreshold" differently by streaming the object directly into a packfile. When there is an active ODB transaction, these blobs are streamed to the same packfile instead of using a separate packfile for each. If "pack.packSizeLimit" is configured and streaming another object causes the packfile to exceed the configured limit, the packfile is truncated back to the previous object and the object write is restarted in a new packfile. This works fine, but requires the fd being read from to save a checkpoint so it becomes possible to rewind the input source via seeking back to a known offset at the beginning. In a subsequent commit, blob streaming is converted to use `struct odb_write_stream` as a more generic input source instead of an fd which doesn't provide a mechanism for rewinding. For this use case though, rewinding the fd is not strictly necessary because the inflated size of the object is known and can be used to approximate whether writing the object would cause the packfile to exceed the configured limit prior to writing anything. These blobs written to the packfile are never deltified thus the size difference between what is written versus the inflated size is due to zlib compression. While this does prevent packfiles from being filled to the potential maximum is some cases, it should be good enough and still prevents the packfile from exceeding any configured limit. Use the inflated blob size to determine whether writing an object to a packfile will exceed the configured "pack.packSizeLimit". Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	8a1f5ecf28	object-file: remove flags from transaction packfile writes The `index_blob_packfile_transaction()` function handles streaming a blob from an fd to compute its object ID and conditionally writes the object directly to a packfile if the INDEX_WRITE_OBJECT flag is set. A subsequent commit will make these packfile object writes part of the transaction interface. Consequently, having the object write be conditional on this flag is a bit awkward. In preparation for this change, introduce a dedicated `hash_blob_stream()` helper that only computes the OID from a `struct odb_write_stream`. This is invoked by `index_fd()` instead when the INDEX_WRITE_OBJECT is not set. The object write performed via `index_blob_packfile_transaction()` is made unconditional accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	970f63519e	odb: update `struct odb_write_stream` read() callback The `read()` callback used by `struct odb_write_stream` currently returns a pointer to an internal buffer along with the number of bytes read. This makes buffer ownership unclear and provides no way to report errors. Update the interface to instead require the caller to provide a buffer, and have the callback return the number of bytes written to it or a negative value on error. While at it, also move the `struct odb_write_stream` definition to "odb/streaming.h". Call sites are updated accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:40 +09:00
Justin Tobler	5f6744d3eb	odb: split `struct odb_transaction` into separate header The current ODB transaction interface is colocated with other ODB interfaces in "odb.{c,h}". Subsequent commits will expand `struct odb_transaction` to support write operations on the transaction directly. To keep things organized and prevent "odb.{c,h}" from becoming more unwieldy, split out `struct odb_transaction` into a separate header. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-15 04:44:39 +09:00
Johannes Schindelin	606c192380	odb, packfile: use size_t for streaming object sizes The odb_read_stream structure uses unsigned long for the size field, which is 32-bit on Windows even in 64-bit builds. When streaming objects larger than 4GB, the size would be truncated to zero or an incorrect value, resulting in empty files being written to disk. Change the size field in odb_read_stream to size_t and introduce unpack_object_header_sz() to return sizes via size_t pointer. Since object_info.sizep remains unsigned long for API compatibility, use temporary variables where the types differ, with comments noting the truncation limitation for code paths that still use unsigned long. Widening the producers to size_t in this way introduces a handful of silent size_t -> unsigned long narrowings on Windows, all in builtin/pack-objects.c, where the consumers are still typed unsigned long. Make those narrowings explicit with cast_size_t_to_ulong() so they assert loudly the moment an object actually exceeds ULONG_MAX bytes: - oe_get_size_slow() returns unsigned long but holds a size_t locally; cast at the return. - write_reuse_object() passes a size_t into check_pack_inflate(), whose expect parameter is unsigned long; cast at the call. - check_object() routes a size_t through SET_SIZE() and SET_DELTA_SIZE(), both of which take unsigned long via oe_set_size() / oe_set_delta_size(); cast at the three call sites in the OBJ_OFS_DELTA / OBJ_REF_DELTA branches and in the non-delta default arm. The cast-only treatment is deliberately a stop-gap. Properly widening oe_set_size, oe_get_size_slow's return type, check_pack_inflate's expect parameter, object_info.sizep, patch_delta, and the OE_SIZE_BITS bit-fields cascades into a series that is too large to be reviewable, so the proper widening is deferred to a follow-up topic. Until then, cast_size_t_to_ulong() at least makes the truncation explicit at the source: it documents the boundary, and on a 64-bit non-Windows platform it is a no-op. This was originally authored by LordKiRon <https://github.com/LordKiRon>, who preferred not to reveal their real name and therefore agreed that I take over authorship. Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:31 +09:00
Johannes Schindelin	d05d666977	git-zlib: handle data streams larger than 4GB On Windows, zlib's `uLong` type is 32-bit even on 64-bit systems. When processing data streams larger than 4GB, the `total_in` and `total_out` fields in zlib's `z_stream` structure wrap around, which caused the sanity checks in `zlib_post_call()` to trigger `BUG()` assertions. The git_zstream wrapper now tracks its own 64-bit totals rather than copying them from zlib. The sanity checks compare only the low bits, using `maximum_unsigned_value_of_type(uLong)` to mask appropriately for the platform's `uLong` size. This is based on work by LordKiRon in git-for-windows#6076. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-05-09 11:25:31 +09:00
Junio C Hamano	fe4ab2e698	Merge branch 'jt/index-fd-wo-repo-regression-fix-maint' During Git 2.52 timeframe, we broke streaming computation of object hash outside a repository, which has been corrected. * jt/index-fd-wo-repo-regression-fix-maint: object-file: avoid ODB transaction when not writing objects	2026-04-08 10:20:51 -07:00
Junio C Hamano	9797fed6ce	Merge branch 'ps/odb-cleanup' Various code clean-up around odb subsystem. * ps/odb-cleanup: odb: drop unneeded headers and forward decls odb: rename `odb_has_object()` flags odb: use enum for `odb_write_object` flags odb: rename `odb_write_object()` flags treewide: use enum for `odb_for_each_object()` flags CodingGuidelines: document our style for flags	2026-04-08 10:19:17 -07:00
Justin Tobler	7d8727ff0b	object-file: avoid ODB transaction when not writing objects In `ce1661f9da` (odb: add transaction interface, 2025-09-16), existing ODB transaction logic is adapted to create a transaction interface at the ODB layer. The intent here is for the ODB transaction interface to eventually provide an object source agnostic means to manage transactions. An unintended consequence of this change though is that `object-file.c:index_fd()` may enter the ODB transaction path even when no object write is requested. In non-repository contexts, this can result in a NULL dereference and segfault. One such case occurs when running git-diff(1) outside of a repository with "core.bigFileThreshold" forcing the streaming path in `index_fd()`: $ echo foo >foo $ echo bar >bar $ git -c core.bigFileThreshold=1 diff -- foo bar In this scenario, the caller only needs to compute the object ID. Object hashing does not require an ODB, so starting a transaction is both unnecessary and invalid. Fix the bug by avoiding the use of ODB transactions in `index_fd()` when callers are only interested in computing the object hash. Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com> Signed-off-by: Justin Tobler <jltobler@gmail.com> [jc: adjusted to `fd13909e` (Merge branch 'jt/odb-transaction', 2025-10-02)] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-04-07 17:32:36 -07:00
Junio C Hamano	7b6d0cd51b	Merge branch 'ps/fsck-wo-the-repository' Internals of "git fsck" have been refactored to not depend on the global `the_repository` variable. * ps/fsck-wo-the-repository: builtin/fsck: stop using `the_repository` in error reporting builtin/fsck: stop using `the_repository` when marking objects builtin/fsck: stop using `the_repository` when checking packed objects builtin/fsck: stop using `the_repository` with loose objects builtin/fsck: stop using `the_repository` when checking reflogs builtin/fsck: stop using `the_repository` when checking refs builtin/fsck: stop using `the_repository` when snapshotting refs builtin/fsck: fix trivial dependence on `the_repository` fsck: drop USE_THE_REPOSITORY fsck: store repository in fsck options fsck: initialize fsck options via a function fetch-pack: move fsck options into function scope	2026-04-07 14:59:26 -07:00
Patrick Steinhardt	c63911b052	odb: rename `odb_has_object()` flags Rename `odb_has_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:14 -07:00
Patrick Steinhardt	b2d421ece6	odb: use enum for `odb_write_object` flags We've got a couple of functions that accept `odb_write_object()` flags, but all of them accept the flags as an `unsigned` integer. In fact, we don't even have an `enum` for the flags field. Introduce this `enum` and adapt functions accordingly according to our coding style. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:13 -07:00
Patrick Steinhardt	ff2e9d85d6	odb: rename `odb_write_object()` flags Rename `odb_write_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-31 20:43:13 -07:00
Junio C Hamano	8e2964dc89	Merge branch 'ps/object-counting' The logic to count objects has been cleaned up. * ps/object-counting: odb: introduce generic object counting odb/source: introduce generic object counting object-file: generalize counting objects object-file: extract logic to approximate object count packfile: extract logic to count number of objects odb: stop including "odb/source.h"	2026-03-25 12:58:05 -07:00
Patrick Steinhardt	3749853908	fsck: store repository in fsck options The fsck subsystem relies on `the_repository` quite a bit. While we could of course explicitly pass a repository down the callchain, we already have a `struct fsck_options` that we pass to almost all functions. Extend the options to also store the repository to make it readily available. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-23 08:33:10 -07:00
Patrick Steinhardt	f223609026	fsck: initialize fsck options via a function We initialize the `struct fsck_options` via a set of macros, often in global scope. In the next commit though we're about to introduce a new repository field to the options that must be initialized, and naturally we don't have a repo other than `the_repository` available in this scope. Refactor the code to instead intrdouce a new `fsck_options_init()` function that initializes the options for us and move initialization into function scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-23 08:33:10 -07:00
Patrick Steinhardt	ab3ab1038d	object-name: move logic to compute loose abbreviation length The function `repo_find_unique_abbrev_r()` takes as input an object ID as well as a minimum object ID length and returns the minimum required prefix to make the object ID unique. The logic that computes the abbreviation length for loose objects is deeply tied to the loose object storage format. As such, it would fail in case a different object storage format was used. Prepare for making this logic generic to the backend by moving the logic into a new `odb_source_loose_find_abbrev_len()` function that is part of "object-file.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	284b7862be	object-name: move logic to iterate through loose prefixed objects The logic to iterate through loose objects that have a certain prefix is currently hosted in "object-name.c". This logic reaches into specifics of the loose object source, so it breaks once a different backend is used for the object storage. Move the logic to iterate through loose objects with a prefix into "object-file.c". This is done by extending the for-each-object options to support an optional prefix that is then honored by the loose source. Naturally, we'll also have this support in the packfile store. This is done in the next commit. Furthermore, there are no users of the loose cache outside of "object-file.c" anymore. As such, convert `odb_source_loose_cache()` to have file scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:42 -07:00
Patrick Steinhardt	cfd575f0a9	odb: introduce `struct odb_for_each_object_options` The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-20 13:16:41 -07:00
Junio C Hamano	7f75767554	Merge branch 'ps/object-counting' into ps/odb-generic-object-name-handling * ps/object-counting: object-file: fix sparse 'plain integer as NULL pointer' error odb: introduce generic object counting odb/source: introduce generic object counting object-file: generalize counting objects object-file: extract logic to approximate object count packfile: extract logic to count number of objects odb: stop including "odb/source.h"	2026-03-20 13:16:09 -07:00

1 2 3 4 5 ...

346 Commits