From 263c5a155b585a67e1d36f660984babcf636e9b2 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Sat, 29 Nov 2025 09:21:58 +0100 Subject: [PATCH 01/35] ci(dockerized): reduce the PID limit for private repositories Every once in a while I need to verify that Microsoft Git's test suite passes for changes that are not yet meant for public consumption, and since it was (made) too difficult to keep up a working Azure Pipeline definition, I have to use GitHub Actions in a private GitHub repository for that purpose. In these tests, basically all Dockerized CI jobs fail consistently. The symptom is something like: error: cannot create async thread: Resource temporarily unavailable in the middle of a test, typically in the t5xxx-t6xxx range. The first such error is immediately followed by plenty more of these errors, and not a single test succeeds afterwards. At first, I thought that maybe the massive parallelism I enjoy there is the problem, and I thought that the cgroups limits might be shared between the many containers that run on essentially the same physical machine. But even reducing the matrix to just a single of those Dockerized jobs runs into the very same problems. The underlying reason seems to be a substantial difference in the hosted runners that execute these Dockerized jobs: forcing the PID limit of the container to a high number lets the jobs pass, even when running the complete matrix of all 13 Dockerized jobs concurrently. But that's not the only difference: The jobs seem to take a lot longer in these containers than, say, in the containers made available to https://github.com/git/git. When forcing a PID limit of 64k in that private repository, the jobs completed successfully, but they also took a lot longer, between 2x to 2.5x longer, i.e. painfully much longer. Reducing the PID limit to 16k, the CI jobs still passed, but took an equally long amount of time. Reducing the PID limit to 8k caused the errors to reappear. Here are the numbers from three example runs, the first one forcing the PID and nproc limit to 65536, the second one to 16384, the third run is from the public git/git repository: Job | 64k | 16k | reference ------------------------------|---------|---------|--------- almalinux-8 | 19m 3s | 16m 0s | 9m 36s debian-11 | 20m 31s | 20m 3s | 8m 5s fedora-breaking-changes-meson | 16m 29s | 19m 19s | 9m 40s linux-asan-ubsan | 1h 10m | 1h 11m | 34m 36s linux-breaking-changes | 25m 39s | 25m 58s | 13m 15s linux-leaks | 1h 9m | 1h 10m | 33m 30s linux-meson | 28m 9s | 27m 4s | 13m 45s linux-musl-meson | 16m 32s | 13m 39s | 8m 6s linux-reftable-leaks | 1h 13m | 1h 13m | 34m 34s linux-reftable | 26m 2s | 25m 48s | 13m 31s linux-sha256 | 26m 12s | 26m 3s | 12m 36s linux-TEST-vars | 26m 5s | 25m 21s | 13m 25s linux32 | 21m 16s | 19m 57s | 10m 44s It does not look as if the PID limit is the reason for the longer runtime, seeing as the 64k vs 16k timings deviate no more than as is usual with GitHub workflows. So let's go for 16k. Signed-off-by: Johannes Schindelin --- .github/workflows/main.yml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index cf341d74db..85cfedf5b0 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -420,7 +420,9 @@ jobs: CI_JOB_IMAGE: ${{matrix.vector.image}} CUSTOM_PATH: /custom runs-on: ubuntu-latest - container: ${{matrix.vector.image}} + container: + image: ${{ matrix.vector.image }} + options: ${{ github.repository_visibility == 'private' && '--pids-limit 16384 --ulimit nproc=16384:16384 --ulimit nofile=32768:32768' || '' }} steps: - name: prepare libc6 for actions if: matrix.vector.jobname == 'linux32' From ec2c31598368b29a770c81fc955d305ca2be65ec Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Mon, 16 Mar 2026 10:20:23 +0100 Subject: [PATCH 02/35] mingw: skip symlink type auto-detection for network share targets On Windows, symbolic links come in two flavors: file symlinks and directory symlinks. Since Git was born on Linux where this distinction does not exist, Git for Windows has to auto-detect the type by looking at the target. When the target does not yet exist at symlink creation time, Git for Windows creates a "phantom" file symlink and later, once checkout is complete, calls `CreateFileW()` on the target to check whether it is actually a directory. If the symlink target is a UNC path (e.g. `\\attacker\share`), this auto-detection triggers an SMB connection to the remote host. Windows performs NTLM authentication by default for such connections, which means a crafted repository can exfiltrate the cloning user's NTLMv2 hash to an attacker-controlled server without any user interaction beyond `git clone -c core.symlinks=true `. There are ways to specify UNC paths that start with only a single backslash (e.g. `\??\UNC\host\share`); All of them do start like that, though, so let's use that as a tell-tale that we should skip the auto-detection in `process_phantom_symlink()`. The symlink is then left as a file symlink (the `mklink` default), and a warning is emitted suggesting the user set the `symlink` gitattribute to `dir` if a directory symlink is needed. When the attribute is already set, auto-detection is never invoked in the first place, so that code path is unaffected. This is the same class of vulnerability as CVE-2025-66413 (https://github.com/git-for-windows/git/security/advisories/GHSA-hv9c-4jm9-jh3x) and follows the same general mitigation pattern that MinTTY adopted for ANSI escape sequences referencing network share paths (https://github.com/mintty/mintty/security/advisories/GHSA-jf4m-m6rv-p6c5). Note that there are legitimate paths starting with a single backslash that are _not_ network paths: drive-less absolute paths are interpreted as relative to the current working directory's drive. In practice, these are highly uncommon (and brittle, just one working directory change away from breaking). In any case, the only consequence is now that the symlink type of those has to be specified via Git attributes, is all. Reported-by: Justin Lee Addresses: CVE-2026-32631 Addresses: https://github.com/git-for-windows/git/security/advisories/GHSA-9j5h-h4m7-85hx Assisted-by: Claude Opus 4.6 Signed-off-by: Johannes Schindelin --- compat/mingw.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/compat/mingw.c b/compat/mingw.c index 41e055f7de..fcbb04dc01 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -352,6 +352,29 @@ process_phantom_symlink(const wchar_t *wtarget, const wchar_t *wlink) wchar_t relative[MAX_PATH]; const wchar_t *rel; + /* + * Do not follow symlinks to network shares, to avoid NTLM credential + * leak from crafted repositories (e.g. \\attacker-server\share). + * Since paths come in all kind of enterprising shapes and forms (in + * addition to the canonical `\\host\share` form, there's also + * `\??\UNC\host\share`, `\GLOBAL??\UNC\host\share` and also + * `\Device\Mup\host\share`, just to name a few), we simply avoid + * following every symlink target that starts with a slash. + * + * This also catches drive-less absolute paths, of course. These are + * uncommon in practice (and also fragile because they are relative to + * the current working directory's drive). The only "harm" this does + * is that it now requires users to specify via the Git attributes if + * they have such an uncommon symbolic link and need it to be a + * directory type link. + */ + if (is_wdir_sep(wtarget[0])) { + warning("created file symlink '%ls' pointing to '%ls';\n" + "set the `symlink` gitattribute to `dir` if a " + "directory symlink is required", wlink, wtarget); + return PHANTOM_SYMLINK_DONE; + } + /* check that wlink is still a file symlink */ if ((GetFileAttributesW(wlink) & (FILE_ATTRIBUTE_REPARSE_POINT | FILE_ATTRIBUTE_DIRECTORY)) From f331752679b420d01b618172e0aab82bb532b85f Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Wed, 30 Oct 2024 19:48:46 +0100 Subject: [PATCH 03/35] unix-socket: avoid leak when initialization fails When a Unix socket is initialized, the current directory's path is stored so that the cleanup code can `chdir()` back to where it was before exit. If the path that needs to be stored exceeds the default size of the `sun_path` attribute of `struct sockaddr_un` (which is defined as a 108-sized byte array on Linux), a larger buffer needs to be allocated so that it can hold the path, and it is the responsibility of the `unix_sockaddr_cleanup()` function to release that allocated memory. In Git's CI, this stack allocation is not necessary because the code is checked out to `/home/runner/work/git/git`. Concatenate the path `t/trash directory.t0301-credential-cache/.cache/git/credential/socket` and a terminating NUL, and you end up with 96 bytes, 12 shy of the default `sun_path` size. However, I use worktrees with slightly longer paths: `/home/me/projects/git/yes/i/nest/worktrees/to/organize/them/` is more in line with what I have. When I recently tried to locally reproduce a failure of the `linux-leaks` CI job, this t0301 test failed (where it had not failed in CI). The reason: When `credential-cache` tries to reach its daemon initially by calling `unix_sockaddr_init()`, it is expected that the daemon cannot be reached (the idea is to spin up the daemon in that case and try again). However, when this first call to `unix_sockaddr_init()` fails, the code returns early from the `unix_stream_connect()` function _without_ giving the cleanup code a chance to run, skipping the deallocation of above-mentioned path. The fix is easy: do not return early but instead go directly to the cleanup code. Signed-off-by: Johannes Schindelin --- unix-socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/unix-socket.c b/unix-socket.c index 8860203c3f..1fa0cf6c15 100644 --- a/unix-socket.c +++ b/unix-socket.c @@ -84,7 +84,7 @@ int unix_stream_connect(const char *path, int disallow_chdir) struct unix_sockaddr_context ctx; if (unix_sockaddr_init(&sa, path, &ctx, disallow_chdir) < 0) - return -1; + goto fail; fd = socket(AF_UNIX, SOCK_STREAM, 0); if (fd < 0) goto fail; From 72e58d24789519500a80230b6f894cb74681a363 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Mon, 13 Jan 2025 01:26:01 -0500 Subject: [PATCH 04/35] grep: prevent `^$` false match at end of file In some implementations, `regexec_buf()` assumes that it is fed lines; Without `REG_NOTEOL` it thinks the end of the buffer is the end of a line. Which makes sense, but trips up this case because we are not feeding lines, but rather a whole buffer. So the final newline is not the start of an empty line, but the true end of the buffer. This causes an interesting bug: $ echo content >file.txt $ git grep --no-index -n '^$' file.txt file.txt:2: This bug is fixed by making the end of the buffer consistently the end of the final line. The patch was applied from https://lore.kernel.org/git/20250113062601.GD767856@coredump.intra.peff.net/ Reported-by: Olly Betts Signed-off-by: Johannes Schindelin --- grep.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/grep.c b/grep.c index 1d75d31421..733fd3a800 100644 --- a/grep.c +++ b/grep.c @@ -1647,6 +1647,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle bol = gs->buf; left = gs->size; + if (left && gs->buf[left-1] == '\n') + left--; while (left) { const char *eol; int hit; From 967ab3a6175fa06978591711d7e45ac7431fdf58 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 15:00:18 +0200 Subject: [PATCH 05/35] pack-bitmap: stop truncating blob sizes used by --filter=blob:limit Same theme as the preceding pack-objects series: get_size_by_pos() returns an unsigned long but reads its size out of packed_object_info() / odb_read_object_info_extended() via a size_t out-parameter, so on Windows it would silently truncate the very sizes filter_bitmap_blob_limit() then compares against the --filter=blob:limit threshold to decide which blobs to elide from the bitmap-backed traversal. Drop the cast_size_t_to_ulong() and return size_t directly. The two callers' limit comparison promotes to size_t cleanly. limit itself stays unsigned long; it is part of a filter API ripple of its own. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- pack-bitmap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index e8a82945cc..ee2bfaa4a0 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1853,8 +1853,8 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git, OBJ_BLOB); } -static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git, - uint32_t pos) +static size_t get_size_by_pos(struct bitmap_index *bitmap_git, + uint32_t pos) { size_t size; struct object_info oi = OBJECT_INFO_INIT; @@ -1891,7 +1891,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git, die(_("unable to get size of %s"), oid_to_hex(&obj->oid)); } - return cast_size_t_to_ulong(size); + return size; } static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git, From 9a3aef46542c92bb1da331797066ec092d6373d3 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 19:46:47 +0200 Subject: [PATCH 06/35] tree-walk: drop link_len cast in get_tree_entry_follow_symlinks() Smallest piece of the tree topic. link_len is only used as strbuf_splice()'s size_t length and as an array index; widening it outright removes the cast_size_t_to_ulong() shim and the bridge local that fed it. odb_read_object() now writes straight into link_len. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- tree-walk.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index a67f06b9eb..a7bbe3163a 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -777,8 +777,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r, goto done; } else if (S_ISLNK(*mode)) { /* Follow a symlink */ - unsigned long link_len; - size_t link_len_st = 0; + size_t link_len; size_t len; char *contents, *contents_start; struct dir_state *parent; @@ -798,8 +797,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r, contents = odb_read_object(r->objects, ¤t_tree_oid, &type, - &link_len_st); - link_len = cast_size_t_to_ulong(link_len_st); + &link_len); if (!contents) goto done; From d592ee79da9d5ebeee17657fea20486c3c2acc2a Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 20:11:56 +0200 Subject: [PATCH 07/35] tree-walk: widen init_tree_desc() and init_tree_desc_gently() to size_t Prep for dropping the cast_size_t_to_ulong() shim in add_preferred_base() (pack-objects.c), and aligns the public API with the size_t shape the rest of the tree topic is moving toward. struct tree_desc.size stays unsigned int -- the on-disk tree format hard-caps each tree at 4 GiB, so the field is intentionally narrow and the assignment in init_tree_desc_internal() already truncated unsigned long inputs the same way it now truncates size_t inputs. The widening is purely about the call-side type-correctness; the internal cap is unchanged. All 30+ callers pass values that promote cleanly (unsigned long, size_t, or smaller integer types). Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- tree-walk.c | 6 +++--- tree-walk.h | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index a7bbe3163a..e2cea5d883 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -49,7 +49,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l static int init_tree_desc_internal(struct tree_desc *desc, const struct object_id *oid, - const void *buffer, unsigned long size, + const void *buffer, size_t size, struct strbuf *err, enum tree_desc_flags flags) { @@ -63,7 +63,7 @@ static int init_tree_desc_internal(struct tree_desc *desc, } void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid, - const void *buffer, unsigned long size) + const void *buffer, size_t size) { struct strbuf err = STRBUF_INIT; if (init_tree_desc_internal(desc, tree_oid, buffer, size, &err, 0)) @@ -72,7 +72,7 @@ void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid, } int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid, - const void *buffer, unsigned long size, + const void *buffer, size_t size, enum tree_desc_flags flags) { struct strbuf err = STRBUF_INIT; diff --git a/tree-walk.h b/tree-walk.h index 9646c47ac5..af6e82fd3f 100644 --- a/tree-walk.h +++ b/tree-walk.h @@ -85,10 +85,10 @@ int update_tree_entry_gently(struct tree_desc *); * members of `struct tree`. */ void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid, - const void *buf, unsigned long size); + const void *buf, size_t size); int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid, - const void *buf, unsigned long size, + const void *buf, size_t size, enum tree_desc_flags flags); /* From cc9189f291e9fefb55cecb5d1bbbab3070b8366f Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 20:23:53 +0200 Subject: [PATCH 08/35] pack-objects: drop the two tree-walk casts in the preferred-base path With init_tree_desc() widened in the prior commit, the size_t-returning odb_read_object_peeled() call in add_preferred_base() and odb_read_object() call in pbase_tree_get() can both flow straight through to init_tree_desc() and into the pbase_tree_cache. Widen pbase_tree_cache.tree_size and the two local size variables to size_t, drop the size_st bridges, and drop the two cast_size_t_to_ulong() shims. This was the last pair of cast_size_t_to_ulong() call sites in builtin/pack-objects.c, completing the >4 GiB-objects work in that file that this branch and its predecessors have been pursuing. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/pack-objects.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 27048bbb4d..ccbd4e3e06 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1916,7 +1916,7 @@ struct pbase_tree_cache { int ref; int temporary; void *tree_data; - unsigned long tree_size; + size_t tree_size; }; static struct pbase_tree_cache *(pbase_tree_cache[256]); @@ -1943,8 +1943,7 @@ static struct pbase_tree_cache *pbase_tree_get(const struct object_id *oid) { struct pbase_tree_cache *ent, *nent; void *data; - unsigned long size; - size_t size_st = 0; + size_t size; enum object_type type; int neigh; int my_ix = pbase_tree_cache_ix(oid); @@ -1972,8 +1971,7 @@ static struct pbase_tree_cache *pbase_tree_get(const struct object_id *oid) /* Did not find one. Either we got a bogus request or * we need to read and perhaps cache. */ - data = odb_read_object(the_repository->objects, oid, &type, &size_st); - size = cast_size_t_to_ulong(size_st); + data = odb_read_object(the_repository->objects, oid, &type, &size); if (!data) return NULL; if (type != OBJ_TREE) { @@ -2127,16 +2125,14 @@ static void add_preferred_base(struct object_id *oid) { struct pbase_tree *it; void *data; - unsigned long size; - size_t size_st = 0; + size_t size; struct object_id tree_oid; if (window <= num_preferred_base++) return; data = odb_read_object_peeled(the_repository->objects, oid, - OBJ_TREE, &size_st, &tree_oid); - size = cast_size_t_to_ulong(size_st); + OBJ_TREE, &size, &tree_oid); if (!data) return; From c45f1ad47e5fbada88d0fd2e7d3fd97b59c3a9a1 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 19:42:07 +0200 Subject: [PATCH 09/35] diff-delta: widen sizeof_delta_index() return to size_t Last piece of the delta API to still expose unsigned long. The function literally returns struct delta_index.memsize, which became size_t in the first commit of this series. The sole caller (free_unpacked() in builtin/pack-objects.c) already accepts size_t via its freed_mem local, so the widening only removes the implicit size_t -> unsigned long narrowing inside the function body. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- delta.h | 2 +- diff-delta.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/delta.h b/delta.h index eb5c6d2fdb..ab0279168c 100644 --- a/delta.h +++ b/delta.h @@ -28,7 +28,7 @@ void free_delta_index(struct delta_index *index); * * Given pointer must be what create_delta_index() returned, or NULL. */ -unsigned long sizeof_delta_index(struct delta_index *index); +size_t sizeof_delta_index(struct delta_index *index); /* * create_delta: create a delta from given index for the given buffer diff --git a/diff-delta.c b/diff-delta.c index 43c339f010..e07e6a90a1 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -302,7 +302,7 @@ void free_delta_index(struct delta_index *index) free(index); } -unsigned long sizeof_delta_index(struct delta_index *index) +size_t sizeof_delta_index(struct delta_index *index) { if (index) return index->memsize; From d835c8c49a80a52cd1b42ecfa736f1092ce1b445 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 21:27:35 +0200 Subject: [PATCH 10/35] tree: widen struct tree.size and parse_tree_buffer() to size_t Final piece of the tree topic. struct tree.size already receives its values from size_t-shaped sources (odb_read_object() in repo_parse_tree_gently() and in reflog.c::tree_is_complete()), so on Windows it was already silently truncating anything past 4 GiB. Switch the field and parse_tree_buffer()'s size parameter to size_t. All readers feed tree->size into init_tree_desc(), which was widened earlier in this topic; the existing parse_object_buffer() caller in object.c keeps its unsigned long parameter, which promotes cleanly. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- tree.c | 2 +- tree.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tree.c b/tree.c index 53f7395e9f..d37b9bc7b1 100644 --- a/tree.c +++ b/tree.c @@ -172,7 +172,7 @@ struct tree *lookup_tree(struct repository *r, const struct object_id *oid) return object_as_type(obj, OBJ_TREE, 0); } -int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size) +int parse_tree_buffer(struct tree *item, void *buffer, size_t size) { if (item->object.parsed) return 0; diff --git a/tree.h b/tree.h index 677382eed8..50f0b15af4 100644 --- a/tree.h +++ b/tree.h @@ -10,14 +10,14 @@ struct strbuf; struct tree { struct object object; void *buffer; - unsigned long size; + size_t size; }; extern const char *tree_type; struct tree *lookup_tree(struct repository *r, const struct object_id *oid); -int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size); +int parse_tree_buffer(struct tree *item, void *buffer, size_t size); #define parse_tree_gently(t, q) repo_parse_tree_gently(the_repository, t, q) int repo_parse_tree_gently(struct repository *r, struct tree *item, From ad6b453c49ed141aaa91a4e81440e5fc9fc279e8 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 22:12:59 +0200 Subject: [PATCH 11/35] commit: widen the commit-buffer API to size_t Continue the migration from `unsigned long` to `size_t`. The `size` attribute of `struct commit_buffer` is fed either from `odb_read_object()`'s return value (`size_t`, handled with `cast_size_t_to_ulong()`) or from `strbuf.len` in `fake_working_tree_commit()` (silently narrowed today). Widen the field and a couple of function signatures together, drop the shim in `repo_get_commit_buffer()`, and move the matching `unsigned long` locals at the in-tree callers in commit.c (three sites), builtin/replace.c, and builtin/stash.c (two sites). The remaining callers pass NULL or already pass a size_t-compatible variable. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/replace.c | 2 +- builtin/stash.c | 4 ++-- commit.c | 19 +++++++++++-------- commit.h | 8 +++++--- 4 files changed, 19 insertions(+), 14 deletions(-) diff --git a/builtin/replace.c b/builtin/replace.c index aed6b2c8de..b85681080d 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -459,7 +459,7 @@ static int create_graft(int argc, const char **argv, int force, int gentle) struct commit *commit; struct strbuf buf = STRBUF_INIT; const char *buffer; - unsigned long size; + size_t size; if (repo_get_oid(the_repository, old_ref, &old_oid) < 0) return error(_("not a valid object name: '%s'"), old_ref); diff --git a/builtin/stash.c b/builtin/stash.c index c4809f299a..83fecc4a7f 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -2083,7 +2083,7 @@ static int write_commit_with_parents(struct repository *r, const char *orig_author, *orig_committer; char *author = NULL, *committer = NULL; const char *buffer; - unsigned long bufsize; + size_t bufsize; const char *p; struct strbuf msg = STRBUF_INIT; int ret = 0; @@ -2135,7 +2135,7 @@ static int do_import_stash(struct repository *r, const char *rev) struct object_id chain; int res = 0; const char *buffer = NULL; - unsigned long bufsize; + size_t bufsize; struct commit *this = NULL; struct commit_list *items = NULL, *cur; char *msg = NULL; diff --git a/commit.c b/commit.c index 5c4a4319b5..f0aade5b60 100644 --- a/commit.c +++ b/commit.c @@ -349,7 +349,7 @@ int for_each_commit_graft(each_commit_graft_fn fn, void *cb_data) struct commit_buffer { void *buffer; - unsigned long size; + size_t size; }; define_commit_slab(buffer_slab, struct commit_buffer); @@ -366,7 +366,8 @@ void free_commit_buffer_slab(struct buffer_slab *bs) free(bs); } -void set_commit_buffer(struct repository *r, struct commit *commit, void *buffer, unsigned long size) +void set_commit_buffer(struct repository *r, struct commit *commit, + void *buffer, size_t size) { struct commit_buffer *v = buffer_slab_at( r->parsed_objects->buffer_slab, commit); @@ -374,7 +375,9 @@ void set_commit_buffer(struct repository *r, struct commit *commit, void *buffer v->size = size; } -const void *get_cached_commit_buffer(struct repository *r, const struct commit *commit, unsigned long *sizep) +const void *get_cached_commit_buffer(struct repository *r, + const struct commit *commit, + size_t *sizep) { struct commit_buffer *v = buffer_slab_peek( r->parsed_objects->buffer_slab, commit); @@ -390,7 +393,7 @@ const void *get_cached_commit_buffer(struct repository *r, const struct commit * const void *repo_get_commit_buffer(struct repository *r, const struct commit *commit, - unsigned long *sizep) + size_t *sizep) { const void *ret = get_cached_commit_buffer(r, commit, sizep); if (!ret) { @@ -404,7 +407,7 @@ const void *repo_get_commit_buffer(struct repository *r, die("expected commit for %s, got %s", oid_to_hex(&commit->object.oid), type_name(type)); if (sizep) - *sizep = cast_size_t_to_ulong(size); + *sizep = size; } return ret; } @@ -1192,7 +1195,7 @@ int parse_signed_commit(const struct commit *commit, struct strbuf *payload, struct strbuf *signature, const struct git_hash_algo *algop) { - unsigned long size; + size_t size; const char *buffer = repo_get_commit_buffer(the_repository, commit, &size); int ret = parse_buffer_signed_by_header(buffer, size, payload, signature, algop); @@ -1365,7 +1368,7 @@ int verify_commit_buffer(const char *buffer, size_t size, int check_commit_signature(const struct commit *commit, struct signature_check *sigc) { - unsigned long size; + size_t size; const char *buffer = repo_get_commit_buffer(the_repository, commit, &size); int ret = verify_commit_buffer(buffer, size, sigc); @@ -1462,7 +1465,7 @@ struct commit_extra_header *read_commit_extra_headers(struct commit *commit, const char **exclude) { struct commit_extra_header *extra = NULL; - unsigned long size; + size_t size; const char *buffer = repo_get_commit_buffer(the_repository, commit, &size); extra = read_commit_extra_header_lines(buffer, size, exclude); diff --git a/commit.h b/commit.h index 1061ed791b..97779ec943 100644 --- a/commit.h +++ b/commit.h @@ -131,13 +131,15 @@ void free_commit_buffer_slab(struct buffer_slab *bs); * Associate an object buffer with the commit. The ownership of the * memory is handed over to the commit, and must be free()-able. */ -void set_commit_buffer(struct repository *r, struct commit *, void *buffer, unsigned long size); +void set_commit_buffer(struct repository *r, struct commit *, + void *buffer, size_t size); /* * Get any cached object buffer associated with the commit. Returns NULL * if none. The resulting memory should not be freed. */ -const void *get_cached_commit_buffer(struct repository *, const struct commit *, unsigned long *size); +const void *get_cached_commit_buffer(struct repository *, + const struct commit *, size_t *size); /* * Get the commit's object contents, either from cache or by reading the object @@ -146,7 +148,7 @@ const void *get_cached_commit_buffer(struct repository *, const struct commit *, */ const void *repo_get_commit_buffer(struct repository *r, const struct commit *, - unsigned long *size); + size_t *size); /* * Tell the commit subsystem that we are done with a particular commit buffer. From 7ad17a47f80650e198c7e08ffe9d0318c69013ca Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 22:19:32 +0200 Subject: [PATCH 12/35] blame: widen find_line_starts() len parameter to size_t Prep for the upcoming blame_scoreboard.final_buf_size widening: prepare_lines() will pass a size_t through to find_line_starts(), and the other caller (fill_origin_blob() via o->file.size) already goes through long->size_t promotion. The function is file-static and only uses len as a loop bound. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- blame.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blame.c b/blame.c index 126e232416..fe393e45ed 100644 --- a/blame.c +++ b/blame.c @@ -335,7 +335,7 @@ static const char *get_next_line(const char *start, const char *end) } static int find_line_starts(int **line_starts, const char *buf, - unsigned long len) + size_t len) { const char *end = buf + len; const char *p; From 4b2d8ba4644c91c63a6cf35655f1be46244cd07a Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 22:30:31 +0200 Subject: [PATCH 13/35] grep: widen struct grep_source.size and grep_buffer() to size_t This commit continue the migration from `unsigned long` to `size_t`, converting `grep_buffer()` and helpers. The callers are already prepared for this change. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- grep.c | 18 ++++++++---------- grep.h | 4 ++-- 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/grep.c b/grep.c index 1d75d31421..d75fbcef44 100644 --- a/grep.c +++ b/grep.c @@ -864,9 +864,9 @@ void free_grep_patterns(struct grep_opt *opt) free_pattern_expr(opt->pattern_expression); } -static const char *end_of_line(const char *cp, unsigned long *left) +static const char *end_of_line(const char *cp, size_t *left) { - unsigned long l = *left; + size_t l = *left; while (l && *cp != '\n') { l--; cp++; @@ -1454,7 +1454,7 @@ static int should_lookahead(struct grep_opt *opt) } static int look_ahead(struct grep_opt *opt, - unsigned long *left_p, + size_t *left_p, unsigned *lno_p, const char **bol_p) { @@ -1567,7 +1567,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle { const char *bol; const char *peek_bol = NULL; - unsigned long left; + size_t left; unsigned lno = 1; unsigned last_hit = 0; int binary_match_only = 0; @@ -1735,7 +1735,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle goto next_line; } if (show_function && (!peek_bol || peek_bol < bol)) { - unsigned long peek_left = left; + size_t peek_left = left; const char *peek_eol = eol; /* @@ -1854,7 +1854,7 @@ int grep_source(struct grep_opt *opt, struct grep_source *gs) static void grep_source_init_buf(struct grep_source *gs, const char *buf, - unsigned long size) + size_t size) { gs->type = GREP_SOURCE_BUF; gs->name = NULL; @@ -1865,7 +1865,7 @@ static void grep_source_init_buf(struct grep_source *gs, gs->identifier = NULL; } -int grep_buffer(struct grep_opt *opt, const char *buf, unsigned long size) +int grep_buffer(struct grep_opt *opt, const char *buf, size_t size) { struct grep_source gs; int r; @@ -1931,11 +1931,9 @@ void grep_source_clear_data(struct grep_source *gs) static int grep_source_load_oid(struct grep_source *gs) { enum object_type type; - size_t size_st = 0; gs->buf = odb_read_object(gs->repo->objects, gs->identifier, - &type, &size_st); - gs->size = cast_size_t_to_ulong(size_st); + &type, &gs->size); if (!gs->buf) return error(_("'%s': unable to read %s"), gs->name, diff --git a/grep.h b/grep.h index 13e26a9318..0bd705dfc0 100644 --- a/grep.h +++ b/grep.h @@ -212,7 +212,7 @@ void append_grep_pattern(struct grep_opt *opt, const char *pat, const char *orig void append_header_grep_pattern(struct grep_opt *, enum grep_header_field, const char *); void compile_grep_patterns(struct grep_opt *opt); void free_grep_patterns(struct grep_opt *opt); -int grep_buffer(struct grep_opt *opt, const char *buf, unsigned long size); +int grep_buffer(struct grep_opt *opt, const char *buf, size_t size); /* The field parameter is only used to filter header patterns * (where appropriate). If filtering isn't desirable @@ -235,7 +235,7 @@ struct grep_source { struct repository *repo; /* if GREP_SOURCE_OID */ const char *buf; - unsigned long size; + size_t size; char *path; /* for attribute lookups */ struct userdiff_driver *driver; From 31d4885618caf413bdf8ac58332438b3096d05d2 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 22:41:07 +0200 Subject: [PATCH 14/35] fast-export: drop the export_blob() size cast and widen anonymize_blob() Mirror of the preceding fast-import sweep. anonymize_blob() writes strbuf.len (size_t) into its out-parameter, and export_blob()'s non-anonymize branch reads odb_read_object()'s size_t out-parameter through a size_st + cast_size_t_to_ulong() bridge into an unsigned long local; both have been silent on Windows past 4 GiB. Widen the helper signature and the local, and drop the bridge. check_object_signature() and parse_object_buffer() still take unsigned long, so the silent narrowing on Windows just moves from the local assignment to those call sites; both are separate topics. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/fast-export.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 0be43104dc..14c672f594 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -285,7 +285,7 @@ static void show_progress(void) * There's no need to cache this result with anonymize_mem, since * we already handle blob content caching with marks. */ -static char *anonymize_blob(unsigned long *size) +static char *anonymize_blob(size_t *size) { static int counter; struct strbuf out = STRBUF_INIT; @@ -296,7 +296,7 @@ static char *anonymize_blob(unsigned long *size) static void export_blob(const struct object_id *oid) { - unsigned long size; + size_t size; enum object_type type; char *buf; struct object *object; @@ -317,10 +317,8 @@ static void export_blob(const struct object_id *oid) object = (struct object *)lookup_blob(the_repository, oid); eaten = 0; } else { - size_t size_st = 0; buf = odb_read_object(the_repository->objects, oid, &type, - &size_st); - size = cast_size_t_to_ulong(size_st); + &size); if (!buf) die(_("could not read blob %s"), oid_to_hex(oid)); if (check_object_signature(the_repository, oid, buf, size, From 481b8d0cb706f552692c18ae6b9d0ef139536629 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 22:46:50 +0200 Subject: [PATCH 15/35] repo: drop the inflated-size cast in count_objects() Continue the size_t evacuation. count_objects() feeds the inflated size from odb_read_object_info_extended()'s size_t out-parameter into struct object_values (size_t) and check_largest() (size_t) through an unsigned long bridge with a cast_size_t_to_ulong() shim. The bridge was the only narrow link in the chain. Widen the local, point oi.sizep at it directly, and drop the cast. parse_object_buffer() still takes unsigned long, so a Windows narrowing remains at that one call; that is its own follow-up topic. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/repo.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/builtin/repo.c b/builtin/repo.c index 69f3626467..38f0711377 100644 --- a/builtin/repo.c +++ b/builtin/repo.c @@ -783,15 +783,14 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids, for (size_t i = 0; i < oids->nr; i++) { struct object_info oi = OBJECT_INFO_INIT; - unsigned long inflated; - size_t inflated_st = 0; + size_t inflated; struct commit *commit; struct object *obj; void *content; off_t disk; int eaten; - oi.sizep = &inflated_st; + oi.sizep = &inflated; oi.disk_sizep = &disk; oi.contentp = &content; @@ -799,7 +798,6 @@ static int count_objects(const char *path UNUSED, struct oid_array *oids, OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK) < 0) continue; - inflated = cast_size_t_to_ulong(inflated_st); obj = parse_object_buffer(the_repository, &oids->oid[i], type, inflated, content, &eaten); From fc440e5f5df3639b6c1c589cc4e54a30a7432d87 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 23:05:52 +0200 Subject: [PATCH 16/35] unpack-objects: widen the size-passing infrastructure to size_t Drop the last cast_size_t_to_ulong() in builtin/unpack-objects.c. With size_t-typed object sizes already coming in via odb_read_object() and the per-byte varint decode in unpack_one() (widened by f2063855fb), the rest of the file was the only thing left that still threaded sizes through unsigned long: struct obj_buffer.size and struct delta_info.size, get_data() and add_object_buffer(), add_delta_to_list(), resolve_delta(), resolve_against_held(), added_object(), write_object(), unpack_non_delta_entry(), unpack_delta_entry(), and stream_blob(). Widen all of them together. None of those types had a downstream narrow consumer once odb_write_object() and patch_delta() were widened earlier, so the change is mechanical: parameter and field types change, the base_size_st bridge in unpack_delta_entry() and its cast go away, and odb_read_object() now writes into base_size directly. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/unpack-objects.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index f3849bb654..3f7fb27b93 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -40,7 +40,7 @@ static struct progress *progress; */ struct obj_buffer { char *buffer; - unsigned long size; + size_t size; }; static struct decoration obj_decorate; @@ -50,7 +50,7 @@ static struct obj_buffer *lookup_object_buffer(struct object *base) return lookup_decoration(&obj_decorate, base); } -static void add_object_buffer(struct object *object, char *buffer, unsigned long size) +static void add_object_buffer(struct object *object, char *buffer, size_t size) { struct obj_buffer *obj; CALLOC_ARRAY(obj, 1); @@ -114,10 +114,10 @@ static void use(int bytes) * allocated buffer which is reused to hold temporary zstream output * and return NULL instead of returning garbage data. */ -static void *get_data(unsigned long size) +static void *get_data(size_t size) { git_zstream stream; - unsigned long bufsize = dry_run && size > 8192 ? 8192 : size; + size_t bufsize = dry_run && size > 8192 ? 8192 : size; void *buf = xmallocz(bufsize); memset(&stream, 0, sizeof(stream)); @@ -161,7 +161,7 @@ struct delta_info { struct object_id base_oid; unsigned nr; off_t base_offset; - unsigned long size; + size_t size; void *delta; struct delta_info *next; }; @@ -170,7 +170,7 @@ static struct delta_info *delta_list; static void add_delta_to_list(unsigned nr, const struct object_id *base_oid, off_t base_offset, - void *delta, unsigned long size) + void *delta, size_t size) { struct delta_info *info = xmalloc(sizeof(*info)); @@ -261,7 +261,7 @@ static void write_rest(void) } static void added_object(unsigned nr, enum object_type type, - void *data, unsigned long size); + void *data, size_t size); /* * Write out nr-th object from the list, now we know the contents @@ -269,7 +269,7 @@ static void added_object(unsigned nr, enum object_type type, * to be checked at the end. */ static void write_object(unsigned nr, enum object_type type, - void *buf, unsigned long size) + void *buf, size_t size) { if (!strict) { if (odb_write_object(the_repository->objects, buf, size, type, @@ -310,8 +310,8 @@ static void write_object(unsigned nr, enum object_type type, } static void resolve_delta(unsigned nr, enum object_type type, - void *base, unsigned long base_size, - void *delta, unsigned long delta_size) + void *base, size_t base_size, + void *delta, size_t delta_size) { void *result; size_t result_size; @@ -330,7 +330,7 @@ static void resolve_delta(unsigned nr, enum object_type type, * resolve all the deltified objects that are based on it. */ static void added_object(unsigned nr, enum object_type type, - void *data, unsigned long size) + void *data, size_t size) { struct delta_info **p = &delta_list; struct delta_info *info; @@ -349,7 +349,7 @@ static void added_object(unsigned nr, enum object_type type, } } -static void unpack_non_delta_entry(enum object_type type, unsigned long size, +static void unpack_non_delta_entry(enum object_type type, size_t size, unsigned nr) { void *buf = get_data(size); @@ -385,7 +385,7 @@ static ssize_t feed_input_zstream(struct odb_write_stream *in_stream, return buf_len - zstream->avail_out; } -static void stream_blob(unsigned long size, unsigned nr) +static void stream_blob(size_t size, unsigned nr) { git_zstream zstream = { 0 }; struct input_zstream_data data = { 0 }; @@ -416,7 +416,7 @@ static void stream_blob(unsigned long size, unsigned nr) } static int resolve_against_held(unsigned nr, const struct object_id *base, - void *delta_data, unsigned long delta_size) + void *delta_data, size_t delta_size) { struct object *obj; struct obj_buffer *obj_buffer; @@ -431,12 +431,11 @@ static int resolve_against_held(unsigned nr, const struct object_id *base, return 1; } -static void unpack_delta_entry(enum object_type type, unsigned long delta_size, +static void unpack_delta_entry(enum object_type type, size_t delta_size, unsigned nr) { void *delta_data, *base; - unsigned long base_size; - size_t base_size_st = 0; + size_t base_size; struct object_id base_oid; if (type == OBJ_REF_DELTA) { @@ -513,8 +512,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size, return; base = odb_read_object(the_repository->objects, &base_oid, - &type, &base_size_st); - base_size = cast_size_t_to_ulong(base_size_st); + &type, &base_size); if (!base) { error("failed to read delta-pack base object %s", oid_to_hex(&base_oid)); From 6b11df05f8491cb52a2b368df73ed2e869363b05 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 02:25:58 +0200 Subject: [PATCH 17/35] diff-delta: widen struct delta_index size fields to size_t Preparation for widening the delta-encoding API to size_t in subsequent commits, which is what lets pack-objects drop the cast_size_t_to_ulong() shims that 606c192380 (odb, packfile: use size_t for streaming object sizes, 2026-05-08) had to leave behind in get_delta() and try_delta() because their downstream consumers were still narrow. The struct is private to diff-delta.c, so widening its fields in isolation is a no-op at runtime: the values stored continue to fit in 32 bits on Windows because the public API around it still truncates. Splitting it out keeps the API-change commit focused on caller updates. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- diff-delta.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/diff-delta.c b/diff-delta.c index 43c339f010..b6b65d7607 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -125,9 +125,9 @@ struct unpacked_index_entry { }; struct delta_index { - unsigned long memsize; + size_t memsize; const void *src_buf; - unsigned long src_size; + size_t src_size; unsigned int hash_mask; struct index_entry *hash[FLEX_ARRAY]; }; @@ -140,7 +140,7 @@ struct delta_index * create_delta_index(const void *buf, unsigned long bufsize) struct unpacked_index_entry *entry, **hash; struct index_entry *packed_entry, **packed_hash; void *mem; - unsigned long memsize; + size_t memsize; if (!buf || !bufsize) return NULL; From 99c93791d06b1444ce0c3da8d0300e72b54fcb61 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 02:42:39 +0200 Subject: [PATCH 18/35] delta: widen create_delta_index() parameter to size_t The sole caller (try_delta() in builtin/pack-objects.c) passes an unsigned long, which promotes safely, so no caller fixups are needed. Splitting it out keeps the diff_delta() / create_delta() widening, which does ripple to several callers, in its own commit. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- delta.h | 2 +- diff-delta.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/delta.h b/delta.h index eb5c6d2fdb..a19586d789 100644 --- a/delta.h +++ b/delta.h @@ -14,7 +14,7 @@ struct delta_index; * using free_delta_index(). */ struct delta_index * -create_delta_index(const void *buf, unsigned long bufsize); +create_delta_index(const void *buf, size_t bufsize); /* * free_delta_index: free the index created by create_delta_index() diff --git a/diff-delta.c b/diff-delta.c index b6b65d7607..c93ac42594 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -132,7 +132,7 @@ struct delta_index { struct index_entry *hash[FLEX_ARRAY]; }; -struct delta_index * create_delta_index(const void *buf, unsigned long bufsize) +struct delta_index * create_delta_index(const void *buf, size_t bufsize) { unsigned int i, hsize, hmask, entries, prev_val, *hash_count; const unsigned char *data, *buffer = buf; From 017671849fcb7e69a7782bfe0f74c43736e2a5d0 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 09:03:51 +0200 Subject: [PATCH 19/35] pack-objects: widen delta-cache accounting to size_t These three are a single accounting tuple (the globals tracking cumulative cached-delta bytes, plus the helper that compares them against an incoming delta size) and are latently 32-bit on Windows where unsigned long != size_t: a pack with many large cached deltas could wrap silently. The widening is internally consistent on its own: the additions and subtractions against delta_cache_size already come from size_t sources (DELTA_SIZE() returns size_t), and delta_cacheable()'s sole caller in try_delta() still passes unsigned long, which promotes. Prerequisite for dropping try_delta()'s cast_size_t_to_ulong() shims, which becomes possible once create_delta() and diff_delta() are widened in a later commit. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/pack-objects.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 27048bbb4d..2c525cc1b2 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -260,8 +260,8 @@ static int exclude_promisor_objects_best_effort; static int use_delta_islands; -static unsigned long delta_cache_size = 0; -static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE; +static size_t delta_cache_size = 0; +static size_t max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE; static unsigned long cache_max_small_delta_size = 1000; static unsigned long window_memory_limit = 0; @@ -2687,8 +2687,8 @@ struct unpacked { unsigned depth; }; -static int delta_cacheable(unsigned long src_size, unsigned long trg_size, - unsigned long delta_size) +static int delta_cacheable(size_t src_size, size_t trg_size, + size_t delta_size) { if (max_delta_cache_size && delta_cache_size + delta_size > max_delta_cache_size) return 0; From 025d74284a48a46e1a5ede952b3218a8cc157b1b Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 09:08:53 +0200 Subject: [PATCH 20/35] pack-objects: widen free_unpacked() return to size_t free_unpacked() sums two byte counts: sizeof_delta_index() and SIZE(n->entry). The latter has been size_t since the prior topic "More work supporting objects larger than 4GB on Windows" widened SIZE() / oe_size() to size_t, so accumulating it into an unsigned long return was a silent Windows-only truncation on a packing run with many large objects. The sole caller (find_deltas()) holds its own mem_usage in an unsigned long for now and subtracts the return into it, so the new narrowing happens at that subtraction. find_deltas() and the matching try_delta() out-parameter are widened next. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/pack-objects.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 2c525cc1b2..a44e61ab0f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2955,9 +2955,9 @@ static unsigned int check_delta_limit(struct object_entry *me, unsigned int n) return m; } -static unsigned long free_unpacked(struct unpacked *n) +static size_t free_unpacked(struct unpacked *n) { - unsigned long freed_mem = sizeof_delta_index(n->index); + size_t freed_mem = sizeof_delta_index(n->index); free_delta_index(n->index); n->index = NULL; if (n->data) { From 259713f0b41973d5cce217e0f3b07c0d78a6c03f Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 09:24:29 +0200 Subject: [PATCH 21/35] pack-objects: widen mem_usage and try_delta out-param to size_t The pair must move together because find_deltas() passes &mem_usage to try_delta(): widening either alone breaks the type match. mem_usage accumulates per-object byte counts already computed in size_t (SIZE() and sizeof_delta_index() reach here through free_unpacked(), now size_t), and was the last 32-bit-on-Windows narrowing point in the delta-window memory accounting chain. With this commit, that chain is internally size_t end-to-end except for sizeof_delta_index()'s still-narrow return, whose value is bounded by create_delta_index()'s entries cap. window_memory_limit (config-driven via git_config_ulong()) stays unsigned long: it is only compared against mem_usage and promotes. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/pack-objects.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a44e61ab0f..d0ccf8a62d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2787,7 +2787,7 @@ size_t oe_get_size_slow(struct packing_data *pack, } static int try_delta(struct unpacked *trg, struct unpacked *src, - unsigned max_depth, unsigned long *mem_usage) + unsigned max_depth, size_t *mem_usage) { struct object_entry *trg_entry = trg->entry; struct object_entry *src_entry = src->entry; @@ -2974,7 +2974,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size, { uint32_t i, idx = 0, count = 0; struct unpacked *array; - unsigned long mem_usage = 0; + size_t mem_usage = 0; CALLOC_ARRAY(array, window); From 8f7356e90d632185abac56f9cc956ac2b2a18f9c Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 09:46:26 +0200 Subject: [PATCH 22/35] delta: widen create_delta() and diff_delta() to size_t Last stop in the delta-encoding API widening for >4 GiB blobs on Windows: with create_delta_index() done in the prior commit and create_delta()/diff_delta() finished here, every byte count that crosses delta.h is now size_t. The struct fields they store into have been size_t since the diff-delta struct widening. The API change must move with all callers in the same commit (the build only passes when every &delta_size matches the new size_t*). Caller updates are kept minimal: * builtin/pack-objects.c get_delta() and try_delta(): widen only the local delta_size variable; the surrounding unsigned-long locals and their cast_size_t_to_ulong() shims are out of scope here and will be cleaned up in their own commits. * builtin/fast-import.c, diff.c, t/helper/test-pack-deltas.c: keep the local unsigned-long delta size (each feeds a still- unsigned-long downstream consumer: zlib's avail_in, deflate_it(), the test helper's own do_compress()), and bridge via a temporary size_t plus cast_size_t_to_ulong(). The new casts are paid back in later topics that widen those consumers. * t/helper/test-delta.c: widen the local outright (no downstream consumer beyond the test's own out_size, which is already size_t). Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/fast-import.c | 4 +++- builtin/pack-objects.c | 6 ++++-- delta.h | 10 +++++----- diff-delta.c | 4 ++-- diff.c | 4 +++- t/helper/test-delta.c | 2 +- t/helper/test-pack-deltas.c | 5 +++-- 7 files changed, 21 insertions(+), 14 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index aa656c5195..cef98d8fde 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -998,11 +998,13 @@ static int store_object( if (last && last->data.len && last->data.buf && last->depth < max_depth && dat->len > the_hash_algo->rawsz) { + size_t deltalen_st = 0; delta_count_attempts_by_type[type]++; delta = diff_delta(last->data.buf, last->data.len, dat->buf, dat->len, - &deltalen, dat->len - the_hash_algo->rawsz); + &deltalen_st, dat->len - the_hash_algo->rawsz); + deltalen = cast_size_t_to_ulong(deltalen_st); } else delta = NULL; diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d0ccf8a62d..f739fee753 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -353,7 +353,8 @@ static void index_commit_for_bitmap(struct commit *commit) static void *get_delta(struct object_entry *entry) { - unsigned long size, base_size, delta_size; + unsigned long size, base_size; + size_t delta_size; void *buf, *base_buf, *delta_buf; enum object_type type; size_t size_st = 0, base_size_st = 0; @@ -2791,7 +2792,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, { struct object_entry *trg_entry = trg->entry; struct object_entry *src_entry = src->entry; - unsigned long trg_size, src_size, delta_size, sizediff, max_size, sz; + unsigned long trg_size, src_size, sizediff, max_size, sz; + size_t delta_size; unsigned ref_depth; enum object_type type; void *delta_buf; diff --git a/delta.h b/delta.h index a19586d789..59ccaaa0e0 100644 --- a/delta.h +++ b/delta.h @@ -42,8 +42,8 @@ unsigned long sizeof_delta_index(struct delta_index *index); */ void * create_delta(const struct delta_index *index, - const void *buf, unsigned long bufsize, - unsigned long *delta_size, unsigned long max_delta_size); + const void *buf, size_t bufsize, + size_t *delta_size, size_t max_delta_size); /* * diff_delta: create a delta from source buffer to target buffer @@ -54,9 +54,9 @@ create_delta(const struct delta_index *index, * updated with its size. The returned buffer must be freed by the caller. */ static inline void * -diff_delta(const void *src_buf, unsigned long src_bufsize, - const void *trg_buf, unsigned long trg_bufsize, - unsigned long *delta_size, unsigned long max_delta_size) +diff_delta(const void *src_buf, size_t src_bufsize, + const void *trg_buf, size_t trg_bufsize, + size_t *delta_size, size_t max_delta_size) { struct delta_index *index = create_delta_index(src_buf, src_bufsize); if (index) { diff --git a/diff-delta.c b/diff-delta.c index c93ac42594..15210e8381 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -318,8 +318,8 @@ unsigned long sizeof_delta_index(struct delta_index *index) void * create_delta(const struct delta_index *index, - const void *trg_buf, unsigned long trg_size, - unsigned long *delta_size, unsigned long max_size) + const void *trg_buf, size_t trg_size, + size_t *delta_size, size_t max_size) { unsigned int i, val; off_t outpos, moff; diff --git a/diff.c b/diff.c index 2a9d0d8687..69eb2f76a4 100644 --- a/diff.c +++ b/diff.c @@ -3647,9 +3647,11 @@ static void emit_binary_diff_body(struct diff_options *o, delta = NULL; deflated = deflate_it(two->ptr, two->size, &deflate_size); if (one->size && two->size) { + size_t delta_size_st = 0; delta = diff_delta(one->ptr, one->size, two->ptr, two->size, - &delta_size, deflate_size); + &delta_size_st, deflate_size); + delta_size = cast_size_t_to_ulong(delta_size_st); if (delta) { void *to_free = delta; orig_size = delta_size; diff --git a/t/helper/test-delta.c b/t/helper/test-delta.c index 8223a60229..d807afef75 100644 --- a/t/helper/test-delta.c +++ b/t/helper/test-delta.c @@ -32,7 +32,7 @@ int cmd__delta(int argc, const char **argv) die_errno("unable to read '%s'", argv[3]); if (argv[1][1] == 'd') { - unsigned long delta_size; + size_t delta_size; out_buf = diff_delta(from.buf, from.len, data.buf, data.len, &delta_size, 0); diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c index 840797cf0d..5e0f726842 100644 --- a/t/helper/test-pack-deltas.c +++ b/t/helper/test-pack-deltas.c @@ -49,7 +49,7 @@ static void write_ref_delta(struct hashfile *f, { unsigned char header[MAX_PACK_OBJECT_HEADER]; unsigned long delta_size, compressed_size, hdrlen; - size_t size, base_size; + size_t size, base_size, delta_size_st = 0; enum object_type type; void *base_buf, *delta_buf; void *buf = odb_read_object(the_repository->objects, @@ -65,7 +65,8 @@ static void write_ref_delta(struct hashfile *f, die("unable to read %s", oid_to_hex(base)); delta_buf = diff_delta(base_buf, base_size, - buf, size, &delta_size, 0); + buf, size, &delta_size_st, 0); + delta_size = cast_size_t_to_ulong(delta_size_st); compressed_size = do_compress(&delta_buf, delta_size); From 884e8bb88af958fc3b48bead454480892ee42235 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 11:26:01 +0200 Subject: [PATCH 23/35] packfile, git-zlib: widen use_pack() and zstream avail fields to size_t Bundling the two widenings: four call sites pass &stream.avail_in directly to use_pack(), and widening either type fencepost alone would force a bridge variable at each. Doing both together is the simpler end state and is the prerequisite for the do_compress() widening in the next commit, which is what lets write_no_reuse_object() lose its last cast_size_t_to_ulong() shim. The unsigned-long locals widened at the other use_pack() callers (avail / remaining / left) hold pack-window sizes bounded by core.packedGitWindowSize, so the change is type consistency rather than a new >4GB capability. git_zstream.avail_in / avail_out likewise reach zlib's uInt fields only after zlib_buf_cap()'s 1 GiB cap, so the wrapper already accepted size_t-shaped inputs in practice. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- builtin/pack-objects.c | 8 ++++---- git-zlib.h | 4 ++-- pack-check.c | 4 ++-- packfile.c | 4 ++-- packfile.h | 3 ++- 5 files changed, 12 insertions(+), 11 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f739fee753..bef1305ce4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -488,7 +488,7 @@ static void copy_pack_data(struct hashfile *f, off_t len) { unsigned char *in; - unsigned long avail; + size_t avail; while (len) { in = use_pack(p, w_curs, offset, &avail); @@ -2260,7 +2260,7 @@ static void check_object(struct object_entry *entry, uint32_t object_index) struct object_id base_ref; struct object_entry *base_entry; unsigned long used, used_0; - unsigned long avail; + size_t avail; off_t ofs; unsigned char *buf, c; enum object_type type; @@ -2756,8 +2756,8 @@ size_t oe_get_size_slow(struct packing_data *pack, struct pack_window *w_curs; unsigned char *buf; enum object_type type; - unsigned long used, avail; - size_t size; + unsigned long used; + size_t avail, size; if (e->type_ != OBJ_OFS_DELTA && e->type_ != OBJ_REF_DELTA) { size_t sz; diff --git a/git-zlib.h b/git-zlib.h index 44380e8ad3..0b24b15bd0 100644 --- a/git-zlib.h +++ b/git-zlib.h @@ -5,8 +5,8 @@ typedef struct git_zstream { struct z_stream_s z; - unsigned long avail_in; - unsigned long avail_out; + size_t avail_in; + size_t avail_out; size_t total_in; size_t total_out; unsigned char *next_in; diff --git a/pack-check.c b/pack-check.c index 5adfb3f272..befb860472 100644 --- a/pack-check.c +++ b/pack-check.c @@ -34,7 +34,7 @@ int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, uint32_t data_crc = crc32(0, NULL, 0); do { - unsigned long avail; + size_t avail; void *data = use_pack(p, w_curs, offset, &avail); if (avail > len) avail = len; @@ -71,7 +71,7 @@ static int verify_packfile(struct repository *r, r->hash_algo->init_fn(&ctx); do { - unsigned long remaining; + size_t remaining; unsigned char *in = use_pack(p, w_curs, offset, &remaining); offset += remaining; if (!pack_sig_ofs) diff --git a/packfile.c b/packfile.c index 78c389e6f3..7fbe47ca18 100644 --- a/packfile.c +++ b/packfile.c @@ -704,7 +704,7 @@ static int in_window(struct repository *r, struct pack_window *win, unsigned char *use_pack(struct packed_git *p, struct pack_window **w_cursor, off_t offset, - unsigned long *left) + size_t *left) { struct pack_window *win = *w_cursor; @@ -1228,7 +1228,7 @@ int unpack_object_header(struct packed_git *p, size_t *sizep) { unsigned char *base; - unsigned long left; + size_t left; unsigned long used; enum object_type type; diff --git a/packfile.h b/packfile.h index defb6f442c..820d247d05 100644 --- a/packfile.h +++ b/packfile.h @@ -402,7 +402,8 @@ uint32_t get_pack_fanout(struct packed_git *p, uint32_t value); struct object_database; -unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *); +unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, + size_t *); void close_pack_windows(struct packed_git *); void close_pack(struct packed_git *); void unuse_pack(struct pack_window **); From 73510d45ab28e425b3173fe3069ec9333c4c7438 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 16:41:17 +0200 Subject: [PATCH 24/35] diff: stop truncating the deflated-binary-diff size on Windows Continue the size_t evacuation around large object handling: with deflate_it() and the locals around it widened, the cast_size_t_to_ulong() shim the prior delta_delta() widening had to leave behind in emit_binary_diff_body() goes away. deflate_it() is file-static; the only callers are the two in emit_binary_diff_body() already touched here. emit_diff_symbol() formats the resulting sizes via uintmax_t / %"PRIuMAX", so the diff output is not affected; only the per-process upper bound on a binary patch chunk that this function can address grows beyond 4 GiB on Windows. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- diff.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/diff.c b/diff.c index c14f69719b..d0be7c8f50 100644 --- a/diff.c +++ b/diff.c @@ -3606,8 +3606,8 @@ static int checkdiff_consume(void *priv, char *line, unsigned long len) } static unsigned char *deflate_it(char *data, - unsigned long size, - unsigned long *result_size) + size_t size, + size_t *result_size) { size_t bound; unsigned char *deflated; @@ -3636,10 +3636,10 @@ static void emit_binary_diff_body(struct diff_options *o, void *delta; void *deflated; void *data; - unsigned long orig_size; - unsigned long delta_size; - unsigned long deflate_size; - unsigned long data_size; + size_t orig_size; + size_t delta_size; + size_t deflate_size; + size_t data_size; /* We could do deflated delta, or we could do just deflated two, * whichever is smaller. @@ -3647,11 +3647,9 @@ static void emit_binary_diff_body(struct diff_options *o, delta = NULL; deflated = deflate_it(two->ptr, two->size, &deflate_size); if (one->size && two->size) { - size_t delta_size_st = 0; delta = diff_delta(one->ptr, one->size, two->ptr, two->size, - &delta_size_st, deflate_size); - delta_size = cast_size_t_to_ulong(delta_size_st); + &delta_size, deflate_size); if (delta) { void *to_free = delta; orig_size = delta_size; From 89c8ca8a97ff5793db0b1f36ff535b4005956d97 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 11:44:13 +0200 Subject: [PATCH 25/35] archive-zip: widen zlib_deflate_raw()'s maxsize local to size_t Prep for the upcoming git_deflate_bound() widening to size_t: the local that catches its return needs to be size_t too, otherwise the widening would introduce a silent Windows narrowing here. No semantic effect with the current unsigned-long-returning git_deflate_bound() (size_t == unsigned long on this caller's platforms today). Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- archive-zip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/archive-zip.c b/archive-zip.c index 97ea8d60d6..a487d4c041 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -206,7 +206,7 @@ static void *zlib_deflate_raw(void *data, unsigned long size, unsigned long *compressed_size) { git_zstream stream; - unsigned long maxsize; + size_t maxsize; void *buffer; int result; From 71c8cacb3f80161c4c36375196326591bd9af60e Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 16:44:00 +0200 Subject: [PATCH 26/35] convert: widen gather_convert_stats() helpers to size_t Prep for the upcoming read_blob_data_from_index() widening, whose callers in convert.c feed the size they receive straight into these two helpers. Both are file-static, so the change is contained. Also fixes a small pre-existing narrowing on the get_wt_convert_stats_ascii() path, where strbuf.len (size_t) was passed to a unsigned long parameter. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- convert.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/convert.c b/convert.c index 036506842c..74d452b0de 100644 --- a/convert.c +++ b/convert.c @@ -102,7 +102,7 @@ static int convert_is_binary(const struct text_stat *stats) return 0; } -static unsigned int gather_convert_stats(const char *data, unsigned long size) +static unsigned int gather_convert_stats(const char *data, size_t size) { struct text_stat stats; int ret = 0; @@ -119,7 +119,7 @@ static unsigned int gather_convert_stats(const char *data, unsigned long size) return ret; } -static const char *gather_convert_stats_ascii(const char *data, unsigned long size) +static const char *gather_convert_stats_ascii(const char *data, size_t size) { unsigned int convert_stats = gather_convert_stats(data, size); From 8a46bf3d69d7b459086985852c45e4f01d1c7798 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 12:05:22 +0200 Subject: [PATCH 27/35] diff: widen deflate_it()'s bound local from int to size_t Fixes a pre-existing silent narrowing from git_deflate_bound()'s unsigned long return into an int local: anything past 2 GiB has always wrapped negative here and then been re-extended to size_t inside xmalloc(). Also prep for the upcoming git_deflate_bound() widening to size_t, which would extend the narrowing further if bound stayed int. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 69eb2f76a4..c14f69719b 100644 --- a/diff.c +++ b/diff.c @@ -3609,7 +3609,7 @@ static unsigned char *deflate_it(char *data, unsigned long size, unsigned long *result_size) { - int bound; + size_t bound; unsigned char *deflated; git_zstream stream; struct repo_config_values *cfg = repo_config_values(the_repository); From f95e2e4a5f94678264166924f816116452442407 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 17:00:12 +0200 Subject: [PATCH 28/35] read-cache: stop truncating index blob sizes on Windows Continue the size_t evacuation. read_blob_data_from_index() reads the blob through the size_t odb_read_object() API but writes the size back through an unsigned long out-parameter, silently truncating anything past 4 GiB on Windows. Widen the out-parameter, drop the cast_size_t_to_ulong() shim, and move the matching locals in the two convert.c callers and the one in attr.c. Their downstream consumers (gather_convert_stats() widened in the prior commit and read_attr_from_buf() already size_t) take the new type directly. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- attr.c | 2 +- convert.c | 4 ++-- read-cache-ll.h | 2 +- read-cache.c | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/attr.c b/attr.c index c61472a4e6..b9852d8587 100644 --- a/attr.c +++ b/attr.c @@ -793,7 +793,7 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate, { struct attr_stack *stack = NULL; char *buf; - unsigned long size; + size_t size; int sparse_dir_pos = -1; if (!istate) diff --git a/convert.c b/convert.c index 74d452b0de..77f06fcfdb 100644 --- a/convert.c +++ b/convert.c @@ -141,7 +141,7 @@ const char *get_cached_convert_stats_ascii(struct index_state *istate, const char *path) { const char *ret; - unsigned long sz; + size_t sz; void *data = read_blob_data_from_index(istate, path, &sz); ret = gather_convert_stats_ascii(data, sz); free(data); @@ -223,7 +223,7 @@ static void check_global_conv_flags_eol(const char *path, static int has_crlf_in_index(struct index_state *istate, const char *path) { - unsigned long sz; + size_t sz; void *data; const char *crp; int has_crlf = 0; diff --git a/read-cache-ll.h b/read-cache-ll.h index 2c8b4b21b1..a3643dce24 100644 --- a/read-cache-ll.h +++ b/read-cache-ll.h @@ -411,7 +411,7 @@ int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip); int ce_same_name(const struct cache_entry *a, const struct cache_entry *b); void set_object_name_for_intent_to_add_entry(struct cache_entry *ce); int index_name_is_other(struct index_state *, const char *, int); -void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *); +void *read_blob_data_from_index(struct index_state *, const char *, size_t *); /* do stat comparison even if CE_VALID is true */ #define CE_MATCH_IGNORE_VALID 01 diff --git a/read-cache.c b/read-cache.c index 21ca58beea..8be8912f16 100644 --- a/read-cache.c +++ b/read-cache.c @@ -3459,7 +3459,7 @@ int index_name_is_other(struct index_state *istate, const char *name, } void *read_blob_data_from_index(struct index_state *istate, - const char *path, unsigned long *size) + const char *path, size_t *size) { int pos, len; size_t sz; @@ -3490,7 +3490,7 @@ void *read_blob_data_from_index(struct index_state *istate, return NULL; } if (size) - *size = cast_size_t_to_ulong(sz); + *size = sz; return data; } From 759542b2f5d3076935787104e34a4726435a577f Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 12:12:57 +0200 Subject: [PATCH 29/35] http-push: widen start_put()'s size local from ssize_t to size_t The local is initialised from git_deflate_bound() (an unsigned upper bound on the deflated output, never negative) and used in exactly three places: the initialising assignment, strbuf_grow(buf, size) whose parameter is already size_t, and stream.avail_out which became size_t in the prior commit. There is no comparison against zero or a negative value, no subtraction, no arithmetic that depends on signedness, and no path that would assign a signed quantity to it. The original ssize_t was the wrong type to begin with: a git_deflate_bound() result above SSIZE_MAX would have wrapped negative on assignment and then implicitly re-extended to a huge size_t at strbuf_grow() / stream.avail_out, requesting an absurd allocation. That is not a real-world concern for the object sizes http-push pushes today, but it is also the reason the type needs to move to size_t before git_deflate_bound() itself is widened. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- http-push.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/http-push.c b/http-push.c index 3c23cbba27..2a07d14259 100644 --- a/http-push.c +++ b/http-push.c @@ -367,7 +367,7 @@ static void start_put(struct transfer_request *request) void *unpacked; size_t len; int hdrlen; - ssize_t size; + size_t size; git_zstream stream; struct repo_config_values *cfg = repo_config_values(the_repository); From ebb6368e915906d431f2913fb352528213919ca8 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 17:33:36 +0200 Subject: [PATCH 30/35] xdiff-interface: widen buffer_is_binary() size parameter to size_t Prep for the widenings of its callers, where size-receiving locals will become size_t (combine-diff's result_size in the immediately following commit, struct diff_filespec.size in a later topic). Body caps the parameter at 8000 anyway, so the type change is mechanical. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- xdiff-interface.c | 2 +- xdiff-interface.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/xdiff-interface.c b/xdiff-interface.c index db6938689f..18e37d2479 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -195,7 +195,7 @@ void read_mmblob(mmfile_t *ptr, struct object_database *odb, } #define FIRST_FEW_BYTES 8000 -int buffer_is_binary(const char *ptr, unsigned long size) +int buffer_is_binary(const char *ptr, size_t size) { if (FIRST_FEW_BYTES < size) size = FIRST_FEW_BYTES; diff --git a/xdiff-interface.h b/xdiff-interface.h index ce54e1c0e0..41fa1d7562 100644 --- a/xdiff-interface.h +++ b/xdiff-interface.h @@ -49,7 +49,7 @@ int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2, int read_mmfile(mmfile_t *ptr, const char *filename); void read_mmblob(mmfile_t *ptr, struct object_database *odb, const struct object_id *oid); -int buffer_is_binary(const char *ptr, unsigned long size); +int buffer_is_binary(const char *ptr, size_t size); void xdiff_set_find_func(xdemitconf_t *xecfg, const char *line, int cflags); void xdiff_clear_find_func(xdemitconf_t *xecfg); From c26d7081c48fa2cefa5c14a6230bbe664608dc55 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 12:17:13 +0200 Subject: [PATCH 31/35] t/helper/test-pack-deltas: widen do_compress()'s maxsize local to size_t Prep for the upcoming git_deflate_bound() widening to size_t. The local is only ever the return value of git_deflate_bound() and the xmalloc() / stream.avail_out sizes derived from it; widening it has no semantic effect today. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- t/helper/test-pack-deltas.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c index 5e0f726842..959705feca 100644 --- a/t/helper/test-pack-deltas.c +++ b/t/helper/test-pack-deltas.c @@ -22,7 +22,7 @@ static unsigned long do_compress(void **pptr, unsigned long size) { git_zstream stream; void *in, *out; - unsigned long maxsize; + size_t maxsize; git_deflate_init(&stream, 1); maxsize = git_deflate_bound(&stream, size); From a551c408927eeed50b904da7b65f0c6c91b0699a Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 18:07:31 +0200 Subject: [PATCH 32/35] combine-diff: stop truncating combined-diff blob sizes on Windows Continue the size_t evacuation. With buffer_is_binary() widened in the prior commit, every consumer that the size flows into in combine-diff.c is size_t-ready, so widen grab_blob()'s out-param outright and move the matching locals at its three call sites together. grab_blob()'s body collapses to a direct odb_read_object(&size) since the bridge variable is no longer needed. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- combine-diff.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/combine-diff.c b/combine-diff.c index fb72174918..4915bf335d 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -304,7 +304,7 @@ static struct lline *coalesce_lines(struct lline *base, int *lenbase, static char *grab_blob(struct repository *r, const struct object_id *oid, unsigned int mode, - unsigned long *size, struct userdiff_driver *textconv, + size_t *size, struct userdiff_driver *textconv, const char *path) { char *blob; @@ -325,9 +325,7 @@ static char *grab_blob(struct repository *r, *size = fill_textconv(r, textconv, df, &blob); free_filespec(df); } else { - size_t size_st = 0; - blob = odb_read_object(r->objects, oid, &type, &size_st); - *size = cast_size_t_to_ulong(size_st); + blob = odb_read_object(r->objects, oid, &type, size); if (!blob) die(_("unable to read %s"), oid_to_hex(oid)); if (type != OBJ_BLOB) @@ -431,7 +429,7 @@ static void combine_diff(struct repository *r, xdemitconf_t xecfg; mmfile_t parent_file; struct combine_diff_state state; - unsigned long sz; + size_t sz; if (result_deleted) return; /* result deleted */ @@ -1015,7 +1013,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, struct rev_info *rev) { struct diff_options *opt = &rev->diffopt; - unsigned long result_size, cnt, lno; + size_t result_size, cnt, lno; int result_deleted = 0; char *result, *cp; struct sline *sline; /* survived lines */ @@ -1134,7 +1132,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, is_binary = buffer_is_binary(result, result_size); for (i = 0; !is_binary && i < num_parent; i++) { char *buf; - unsigned long size; + size_t size; buf = grab_blob(opt->repo, &elem->parent[i].oid, elem->parent[i].mode, From 9ad39d088791e0318173126f67f3c8f757d4d885 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 12:21:20 +0200 Subject: [PATCH 33/35] git-zlib: widen git_deflate_bound() to size_t All four `unsigned long` / `int` / `ssize_t` receivers across archive-zip, diff, http-push and t/helper/test-pack-deltas were widened to size_t in the prior commits, and remote-curl and fast-import were already there. With every caller prepared, both the parameter and the return type can now move without introducing any silent narrowing. For inputs above zlib's uLong range (i.e. >4 GiB on platforms where uLong is 32-bit, notably 64-bit Windows), defer to zlib's stored-block formula (the same fallback it would itself use for an unknown stream state) plus the worst-case wrapper overhead. The existing path through deflateBound() is unchanged for inputs that fit. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- git-zlib.c | 16 ++++++++++++++-- git-zlib.h | 2 +- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/git-zlib.c b/git-zlib.c index d21adb3bf5..ebbbcc6d1a 100644 --- a/git-zlib.c +++ b/git-zlib.c @@ -167,9 +167,21 @@ int git_inflate(git_zstream *strm, int flush) return status; } -unsigned long git_deflate_bound(git_zstream *strm, unsigned long size) +size_t git_deflate_bound(git_zstream *strm, size_t size) { - return deflateBound(&strm->z, size); +#if SIZE_MAX > ULONG_MAX + if (size > maximum_unsigned_value_of_type(uLong)) + /* + * deflateBound() takes uLong, which is 32-bit on + * Windows. For inputs above that range, return zlib's + * stored-block formula (the conservative path it would + * itself use for an unknown stream state) plus the + * worst-case wrapper overhead. + */ + return size + (size >> 5) + (size >> 7) + (size >> 11) + + 7 + 18; +#endif + return deflateBound(&strm->z, (uLong)size); } void git_deflate_init(git_zstream *strm, int level) diff --git a/git-zlib.h b/git-zlib.h index 0b24b15bd0..9248d11ca9 100644 --- a/git-zlib.h +++ b/git-zlib.h @@ -25,6 +25,6 @@ void git_deflate_end(git_zstream *); int git_deflate_abort(git_zstream *); int git_deflate_end_gently(git_zstream *); int git_deflate(git_zstream *, int flush); -unsigned long git_deflate_bound(git_zstream *, unsigned long); +size_t git_deflate_bound(git_zstream *, size_t); #endif /* GIT_ZLIB_H */ From 518d12a44f79d9334db06e51ff8502d09335cb99 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 19:26:00 +0200 Subject: [PATCH 34/35] diff: widen textconv_object() size out-param to size_t Continue the size_t evacuation. textconv_object() fills its out-parameter from fill_textconv()'s size_t return through an unsigned long*; widen the API to match, then take advantage of the new shape where callers can. cat-file's 'c' and batch-mode 'c' branches lose their size_ul bridge variables (one site becomes a direct call, the other collapses an if/else into a single negated condition that reads as "try textconv, fall back to a raw read"). blame.c likewise drops the file_size_st bridge in fill_origin_blob() and hoists final_buf_size_st to bracket both branches in setup_scoreboard(). The latter keeps a cast_size_t_to_ulong() shim because struct blame_scoreboard.final_buf_size is still unsigned long; that field is its own topic. log.c just widens its local from unsigned long to size_t. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- blame.c | 21 ++++++++------------- builtin/cat-file.c | 13 ++++--------- builtin/log.c | 2 +- diff.c | 2 +- diff.h | 2 +- 5 files changed, 15 insertions(+), 25 deletions(-) diff --git a/blame.c b/blame.c index 126e232416..6cdeabd633 100644 --- a/blame.c +++ b/blame.c @@ -238,7 +238,7 @@ static struct commit *fake_working_tree_commit(struct repository *r, struct stat st; const char *read_from; char *buf_ptr; - unsigned long buf_len; + size_t buf_len; if (contents_from) { if (stat(contents_from, &st) < 0) @@ -1034,20 +1034,17 @@ static void fill_origin_blob(struct diff_options *opt, { if (!o->file.ptr) { enum object_type type; - unsigned long file_size; + size_t file_size; (*num_read_blob)++; if (opt->flags.allow_textconv && textconv_object(opt->repo, o->path, o->mode, &o->blob_oid, 1, &file->ptr, &file_size)) ; - else { - size_t file_size_st = 0; + else file->ptr = odb_read_object(the_repository->objects, &o->blob_oid, &type, - &file_size_st); - file_size = cast_size_t_to_ulong(file_size_st); - } + &file_size); file->size = file_size; if (!file->ptr) @@ -2864,22 +2861,20 @@ void setup_scoreboard(struct blame_scoreboard *sb, sb->final_buf_size = o->file.size; } else { + size_t final_buf_size_st = 0; o = get_origin(sb->final, sb->path); if (fill_blob_sha1_and_mode(sb->repo, o)) die(_("no such path %s in %s"), sb->path, final_commit_name); if (sb->revs->diffopt.flags.allow_textconv && textconv_object(sb->repo, sb->path, o->mode, &o->blob_oid, 1, (char **) &sb->final_buf, - &sb->final_buf_size)) + &final_buf_size_st)) ; - else { - size_t final_buf_size_st = 0; + else sb->final_buf = odb_read_object(the_repository->objects, &o->blob_oid, &type, &final_buf_size_st); - sb->final_buf_size = - cast_size_t_to_ulong(final_buf_size_st); - } + sb->final_buf_size = cast_size_t_to_ulong(final_buf_size_st); if (!sb->final_buf) die(_("cannot read blob %s for path %s"), diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d6ef8414ee..912e1ef403 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -186,11 +186,9 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) case 'c': { - unsigned long size_ul = 0; int textconv_ret = textconv_object(the_repository, path, obj_context.mode, &oid, 1, - &buf, &size_ul); - size = size_ul; + &buf, &size); if (textconv_ret) break; } @@ -413,12 +411,9 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d oid_to_hex(oid), data->rest); } else if (opt->transform_mode == 'c') { enum object_type type; - unsigned long size_ul = 0; - if (textconv_object(the_repository, - data->rest, 0100644, oid, - 1, &contents, &size_ul)) - size = size_ul; - else + if (!textconv_object(the_repository, + data->rest, 0100644, oid, + 1, &contents, &size)) contents = odb_read_object(the_repository->objects, oid, &type, &size); if (!contents) diff --git a/builtin/log.c b/builtin/log.c index d027ce1e0b..2f5142e888 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -584,7 +584,7 @@ static int show_blob_object(const struct object_id *oid, struct rev_info *rev, c struct object_id oidc; struct object_context obj_context = {0}; char *buf; - unsigned long size; + size_t size; fflush(rev->diffopt.file); if (!rev->diffopt.flags.textconv_set_via_cmdline || diff --git a/diff.c b/diff.c index d0be7c8f50..f0b4ffe512 100644 --- a/diff.c +++ b/diff.c @@ -7845,7 +7845,7 @@ int textconv_object(struct repository *r, const struct object_id *oid, int oid_valid, char **buf, - unsigned long *buf_size) + size_t *buf_size) { struct diff_filespec *df; struct userdiff_driver *textconv; diff --git a/diff.h b/diff.h index bb5cddaf34..ab52ca80c3 100644 --- a/diff.h +++ b/diff.h @@ -757,7 +757,7 @@ int textconv_object(struct repository *repo, const char *path, unsigned mode, const struct object_id *oid, int oid_valid, - char **buf, unsigned long *buf_size); + char **buf, size_t *buf_size); int parse_rename_score(const char **cp_p); From 4bfc0989753d0ecd32aecc1dcb1e4f1f58331954 Mon Sep 17 00:00:00 2001 From: Johannes Schindelin Date: Fri, 5 Jun 2026 19:38:13 +0200 Subject: [PATCH 35/35] diffcore: widen struct diff_filespec.size to size_t Continue the size_t evacuation. The struct field already receives its writes from a size_t-shaped source (xsize_t(st.st_size), strbuf.len, fill_textconv()'s return, odb_read_object_info_extended() via oi.sizep), so on Windows it was already truncating anything past 4 GiB silently on the strbuf and textconv paths and loudly through cast_size_t_to_ulong() on the odb path. Switch the field to size_t. In diff_populate_filespec(), point oi.sizep at the field directly and drop both cast_size_t_to_ulong() shims and the size_st bridge they fed. Downstream consumers that still read .size into unsigned long locals will now silently narrow on Windows where the field exceeds 4 GiB. Each of those is its own follow-up; the writer side is the prerequisite for ever putting a >4 GiB value in the field in the first place. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin --- diff.c | 5 +---- diffcore.h | 2 +- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/diff.c b/diff.c index f0b4ffe512..de5ff1f7d0 100644 --- a/diff.c +++ b/diff.c @@ -4595,9 +4595,8 @@ int diff_populate_filespec(struct repository *r, } } else { - size_t size_st = 0; struct object_info info = { - .sizep = &size_st + .sizep = &s->size }; if (!(size_only || check_binary)) @@ -4619,7 +4618,6 @@ int diff_populate_filespec(struct repository *r, die("unable to read %s", oid_to_hex(&s->oid)); object_read: - s->size = cast_size_t_to_ulong(size_st); if (size_only || check_binary) { if (size_only) return 0; @@ -4634,7 +4632,6 @@ object_read: if (odb_read_object_info_extended(r->objects, &s->oid, &info, OBJECT_INFO_LOOKUP_REPLACE)) die("unable to read %s", oid_to_hex(&s->oid)); - s->size = cast_size_t_to_ulong(size_st); } s->should_free = 1; } diff --git a/diffcore.h b/diffcore.h index d75038d1b3..85fc94e2a5 100644 --- a/diffcore.h +++ b/diffcore.h @@ -54,7 +54,7 @@ struct diff_filespec { char *path; void *data; void *cnt_data; - unsigned long size; + size_t size; int count; /* Reference count */ int rename_used; /* Count of rename users */ unsigned short mode; /* file mode */