Commit Graph

36 Commits

Author SHA1 Message Date
Johannes Schindelin
c9dde90950 Merge branch 'size-t/grep' 2026-06-26 08:57:54 +00:00
Johannes Schindelin
404194393f Merge branch 'size-t/blame' 2026-06-26 08:57:54 +00:00
Johannes Schindelin
67eb2fe4a0 Merge branch 'size-t/commit' 2026-06-26 08:57:54 +00:00
Johannes Schindelin
9a9bc5c853 Merge branch 'size-t/tree' 2026-06-26 08:57:54 +00:00
Johannes Schindelin
366959a9eb Merge branch 'size-t/diff-delta-sizeof' 2026-06-26 08:57:54 +00:00
Johannes Schindelin
74d7be2c3f Merge branch 'size-t/diff' 2026-06-26 08:57:54 +00:00
Johannes Schindelin
2f9cac50fc Merge branch 'size-t/pack-bitmap' 2026-06-26 08:57:54 +00:00
Johannes Schindelin
f8fde0b3f6 grep: widen struct grep_source.size and grep_buffer() to size_t
This commit continue the migration from `unsigned long` to `size_t`,
converting `grep_buffer()` and helpers. The callers are already prepared
for this change.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
2037a3ce5e blame: widen find_line_starts() len parameter to size_t
Prep for the upcoming blame_scoreboard.final_buf_size widening:
prepare_lines() will pass a size_t through to find_line_starts(),
and the other caller (fill_origin_blob() via o->file.size) already
goes through long->size_t promotion. The function is file-static
and only uses len as a loop bound.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
ea9548a2de commit: widen the commit-buffer API to size_t
Continue the migration from `unsigned long` to `size_t`. The `size`
attribute of `struct commit_buffer` is fed either from
`odb_read_object()`'s return value (`size_t`, handled with
`cast_size_t_to_ulong()`) or from `strbuf.len` in
`fake_working_tree_commit()` (silently narrowed today). Widen the field
and a couple of function signatures together, drop the shim in
`repo_get_commit_buffer()`, and move the matching `unsigned long` locals
at the in-tree callers in commit.c (three sites), builtin/replace.c, and
builtin/stash.c (two sites). The remaining callers pass NULL or already
pass a size_t-compatible variable.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
399f013181 tree: widen struct tree.size and parse_tree_buffer() to size_t
Final piece of the tree topic. struct tree.size already receives
its values from size_t-shaped sources (odb_read_object() in
repo_parse_tree_gently() and in reflog.c::tree_is_complete()),
so on Windows it was already silently truncating anything past
4 GiB. Switch the field and parse_tree_buffer()'s size parameter
to size_t.

All readers feed tree->size into init_tree_desc(), which was
widened earlier in this topic; the existing parse_object_buffer()
caller in object.c keeps its unsigned long parameter, which
promotes cleanly.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
02d169f22d diff-delta: widen sizeof_delta_index() return to size_t
Last piece of the delta API to still expose unsigned long. The
function literally returns struct delta_index.memsize, which became
size_t in the first commit of this series. The sole caller
(free_unpacked() in builtin/pack-objects.c) already accepts size_t
via its freed_mem local, so the widening only removes the implicit
size_t -> unsigned long narrowing inside the function body.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
40a799cf1e diffcore: widen struct diff_filespec.size to size_t
Continue the size_t evacuation. The struct field already receives
its writes from a size_t-shaped source (xsize_t(st.st_size),
strbuf.len, fill_textconv()'s return, odb_read_object_info_extended()
via oi.sizep), so on Windows it was already truncating anything
past 4 GiB silently on the strbuf and textconv paths and loudly
through cast_size_t_to_ulong() on the odb path. Switch the field
to size_t.

In diff_populate_filespec(), point oi.sizep at the field directly
and drop both cast_size_t_to_ulong() shims and the size_st bridge
they fed.

Downstream consumers that still read .size into unsigned long
locals will now silently narrow on Windows where the field exceeds
4 GiB. Each of those is its own follow-up; the writer side is the
prerequisite for ever putting a >4 GiB value in the field in the
first place.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
175329fc0f pack-bitmap: stop truncating blob sizes used by --filter=blob:limit
Same theme as the preceding pack-objects series: get_size_by_pos()
returns an unsigned long but reads its size out of packed_object_info()
/ odb_read_object_info_extended() via a size_t out-parameter, so on
Windows it would silently truncate the very sizes filter_bitmap_blob_limit()
then compares against the --filter=blob:limit threshold to decide which
blobs to elide from the bitmap-backed traversal. Drop the
cast_size_t_to_ulong() and return size_t directly.

The two callers' limit comparison promotes to size_t cleanly. limit
itself stays unsigned long; it is part of a filter API ripple of its
own.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
1a53f6dbea git-zlib: widen git_deflate_bound() to size_t
All four `unsigned long` / `int` / `ssize_t` receivers across
archive-zip, diff, http-push and t/helper/test-pack-deltas were
widened to size_t in the prior commits, and remote-curl and
fast-import were already there. With every caller prepared, both the
parameter and the return type can now move without introducing any
silent narrowing.

For inputs above zlib's uLong range (i.e. >4 GiB on platforms where
uLong is 32-bit, notably 64-bit Windows), defer to zlib's
stored-block formula (the same fallback it would itself use for an
unknown stream state) plus the worst-case wrapper overhead. The
existing path through deflateBound() is unchanged for inputs that
fit.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
eca45d90fb diff: widen textconv_object() size out-param to size_t
Continue the size_t evacuation. textconv_object() fills its
out-parameter from fill_textconv()'s size_t return through an
unsigned long*; widen the API to match, then take advantage of the
new shape where callers can.

cat-file's 'c' and batch-mode 'c' branches lose their size_ul
bridge variables (one site becomes a direct call, the other
collapses an if/else into a single negated condition that reads as
"try textconv, fall back to a raw read").

blame.c likewise drops the file_size_st bridge in fill_origin_blob()
and hoists final_buf_size_st to bracket both branches in
setup_scoreboard(). The latter keeps a cast_size_t_to_ulong() shim
because struct blame_scoreboard.final_buf_size is still unsigned
long; that field is its own topic.

log.c just widens its local from unsigned long to size_t.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
8eb5340be9 tree-walk: widen init_tree_desc() and init_tree_desc_gently() to size_t
Prep for dropping the cast_size_t_to_ulong() shim in
add_preferred_base() (pack-objects.c), and aligns the public API
with the size_t shape the rest of the tree topic is moving toward.

struct tree_desc.size stays unsigned int -- the on-disk tree format
hard-caps each tree at 4 GiB, so the field is intentionally narrow
and the assignment in init_tree_desc_internal() already truncated
unsigned long inputs the same way it now truncates size_t inputs.
The widening is purely about the call-side type-correctness; the
internal cap is unchanged.

All 30+ callers pass values that promote cleanly (unsigned long,
size_t, or smaller integer types).

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
9534688108 combine-diff: stop truncating combined-diff blob sizes on Windows
Continue the size_t evacuation. With buffer_is_binary() widened
in the prior commit, every consumer that the size flows into in
combine-diff.c is size_t-ready, so widen grab_blob()'s out-param
outright and move the matching locals at its three call sites
together. grab_blob()'s body collapses to a direct
odb_read_object(&size) since the bridge variable is no longer
needed.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
1cf9bfc086 tree-walk: drop link_len cast in get_tree_entry_follow_symlinks()
Smallest piece of the tree topic. link_len is only used as
strbuf_splice()'s size_t length and as an array index; widening it
outright removes the cast_size_t_to_ulong() shim and the bridge
local that fed it. odb_read_object() now writes straight into
link_len.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
c14c9e842b xdiff-interface: widen buffer_is_binary() size parameter to size_t
Prep for the widenings of its callers, where size-receiving locals
will become size_t (combine-diff's result_size in the immediately
following commit, struct diff_filespec.size in a later topic). Body
caps the parameter at 8000 anyway, so the type change is mechanical.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
d94956c2a6 diff: widen deflate_it()'s bound local from int to size_t
Fixes a pre-existing silent narrowing from git_deflate_bound()'s
unsigned long return into an int local: anything past 2 GiB has
always wrapped negative here and then been re-extended to size_t
inside xmalloc(). Also prep for the upcoming git_deflate_bound()
widening to size_t, which would extend the narrowing further if
bound stayed int.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
fc044dac30 read-cache: stop truncating index blob sizes on Windows
Continue the size_t evacuation. read_blob_data_from_index() reads
the blob through the size_t odb_read_object() API but writes the
size back through an unsigned long out-parameter, silently
truncating anything past 4 GiB on Windows. Widen the out-parameter,
drop the cast_size_t_to_ulong() shim, and move the matching locals
in the two convert.c callers and the one in attr.c. Their
downstream consumers (gather_convert_stats() widened in the prior
commit and read_attr_from_buf() already size_t) take the new type
directly.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
b74a3023d6 archive-zip: widen zlib_deflate_raw()'s maxsize local to size_t
Prep for the upcoming git_deflate_bound() widening to size_t: the
local that catches its return needs to be size_t too, otherwise the
widening would introduce a silent Windows narrowing here. No
semantic effect with the current unsigned-long-returning
git_deflate_bound() (size_t == unsigned long on this caller's
platforms today).

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
ec7073bbf1 convert: widen gather_convert_stats() helpers to size_t
Prep for the upcoming read_blob_data_from_index() widening, whose
callers in convert.c feed the size they receive straight into these
two helpers. Both are file-static, so the change is contained.

Also fixes a small pre-existing narrowing on the get_wt_convert_stats_ascii()
path, where strbuf.len (size_t) was passed to a unsigned long
parameter.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
1d0e6f9746 packfile, git-zlib: widen use_pack() and zstream avail fields to size_t
Bundling the two widenings: four call sites pass &stream.avail_in
directly to use_pack(), and widening either type fencepost alone
would force a bridge variable at each. Doing both together is the
simpler end state and is the prerequisite for the do_compress()
widening in the next commit, which is what lets
write_no_reuse_object() lose its last cast_size_t_to_ulong() shim.

The unsigned-long locals widened at the other use_pack() callers
(avail / remaining / left) hold pack-window sizes bounded by
core.packedGitWindowSize, so the change is type consistency rather
than a new >4GB capability. git_zstream.avail_in / avail_out
likewise reach zlib's uInt fields only after zlib_buf_cap()'s 1 GiB
cap, so the wrapper already accepted size_t-shaped inputs in
practice.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
c235fc10a1 diff: stop truncating the deflated-binary-diff size on Windows
Continue the size_t evacuation around large object handling: with
deflate_it() and the locals around it widened, the
cast_size_t_to_ulong() shim the prior delta_delta() widening had to
leave behind in emit_binary_diff_body() goes away. deflate_it() is
file-static; the only callers are the two in emit_binary_diff_body()
already touched here.

emit_diff_symbol() formats the resulting sizes via uintmax_t / %"PRIuMAX",
so the diff output is not affected; only the per-process upper bound
on a binary patch chunk that this function can address grows beyond
4 GiB on Windows.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
2f8a320527 delta: widen create_delta() and diff_delta() to size_t
Last stop in the delta-encoding API widening for >4 GiB blobs on
Windows: with create_delta_index() done in the prior commit and
create_delta()/diff_delta() finished here, every byte count that
crosses delta.h is now size_t. The struct fields they store into
have been size_t since the diff-delta struct widening.

The API change must move with all callers in the same commit (the
build only passes when every &delta_size matches the new size_t*).
Caller updates are kept minimal:

  * builtin/pack-objects.c get_delta() and try_delta(): widen only
    the local delta_size variable; the surrounding unsigned-long
    locals and their cast_size_t_to_ulong() shims are out of scope
    here and will be cleaned up in their own commits.

  * builtin/fast-import.c, diff.c, t/helper/test-pack-deltas.c:
    keep the local unsigned-long delta size (each feeds a still-
    unsigned-long downstream consumer: zlib's avail_in,
    deflate_it(), the test helper's own do_compress()), and bridge
    via a temporary size_t plus cast_size_t_to_ulong(). The new
    casts are paid back in later topics that widen those consumers.

  * t/helper/test-delta.c: widen the local outright (no downstream
    consumer beyond the test's own out_size, which is already
    size_t).

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
37ecde388c delta: widen create_delta_index() parameter to size_t
The sole caller (try_delta() in builtin/pack-objects.c) passes an
unsigned long, which promotes safely, so no caller fixups are
needed. Splitting it out keeps the diff_delta() / create_delta()
widening, which does ripple to several callers, in its own commit.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
3e560af360 diff-delta: widen struct delta_index size fields to size_t
Preparation for widening the delta-encoding API to size_t in
subsequent commits, which is what lets pack-objects drop the
cast_size_t_to_ulong() shims that 606c192380 (odb, packfile: use
size_t for streaming object sizes, 2026-05-08) had to leave behind
in get_delta() and try_delta() because their downstream consumers
were still narrow.

The struct is private to diff-delta.c, so widening its fields in
isolation is a no-op at runtime: the values stored continue to fit
in 32 bits on Windows because the public API around it still
truncates. Splitting it out keeps the API-change commit focused on
caller updates.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
561a08f732 Merge branch 'fixes-from-the-git-mailing-list'
These fixes have been sent to the Git mailing list but have not been
picked up by the Git project yet.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:51 +00:00
Johannes Schindelin
8989573bd6 Merge branch 'v2.53.0.windows.3'
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:51 +00:00
Jeff King
6800d2b269 grep: prevent ^$ false match at end of file
In some implementations, `regexec_buf()` assumes that it is fed lines;
Without `REG_NOTEOL` it thinks the end of the buffer is the end of a
line. Which makes sense, but trips up this case because we are not
feeding lines, but rather a whole buffer. So the final newline is not
the start of an empty line, but the true end of the buffer.

This causes an interesting bug:

  $ echo content >file.txt
  $ git grep --no-index -n '^$' file.txt
  file.txt:2:

This bug is fixed by making the end of the buffer consistently the end
of the final line.

The patch was applied from
https://lore.kernel.org/git/20250113062601.GD767856@coredump.intra.peff.net/

Reported-by: Olly Betts <olly@survex.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:51 +00:00
Johannes Schindelin
5356e44cf1 unix-socket: avoid leak when initialization fails
When a Unix socket is initialized, the current directory's path is
stored so that the cleanup code can `chdir()` back to where it was
before exit.

If the path that needs to be stored exceeds the default size of the
`sun_path` attribute of `struct sockaddr_un` (which is defined as a
108-sized byte array on Linux), a larger buffer needs to be allocated so
that it can hold the path, and it is the responsibility of the
`unix_sockaddr_cleanup()` function to release that allocated memory.

In Git's CI, this stack allocation is not necessary because the code is
checked out to `/home/runner/work/git/git`. Concatenate the path
`t/trash directory.t0301-credential-cache/.cache/git/credential/socket`
and a terminating NUL, and you end up with 96 bytes, 12 shy of the
default `sun_path` size.

However, I use worktrees with slightly longer paths:
`/home/me/projects/git/yes/i/nest/worktrees/to/organize/them/` is more
in line with what I have. When I recently tried to locally reproduce a
failure of the `linux-leaks` CI job, this t0301 test failed (where it
had not failed in CI).

The reason: When `credential-cache` tries to reach its daemon initially
by calling `unix_sockaddr_init()`, it is expected that the daemon cannot
be reached (the idea is to spin up the daemon in that case and try
again). However, when this first call to `unix_sockaddr_init()` fails,
the code returns early from the `unix_stream_connect()` function
_without_ giving the cleanup code a chance to run, skipping the
deallocation of above-mentioned path.

The fix is easy: do not return early but instead go directly to the
cleanup code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:51 +00:00
Johannes Schindelin
41e9486dff mingw: skip symlink type auto-detection for network share targets
On Windows, symbolic links come in two flavors: file symlinks and
directory symlinks.  Since Git was born on Linux where this distinction
does not exist, Git for Windows has to auto-detect the type by looking
at the target.  When the target does not yet exist at symlink creation
time, Git for Windows creates a "phantom" file symlink and later, once
checkout is complete, calls `CreateFileW()` on the target to check
whether it is actually a directory.

If the symlink target is a UNC path (e.g. `\\attacker\share`), this
auto-detection triggers an SMB connection to the remote host.  Windows
performs NTLM authentication by default for such connections, which
means a crafted repository can exfiltrate the cloning user's NTLMv2
hash to an attacker-controlled server without any user interaction
beyond `git clone -c core.symlinks=true <url>`.

There are ways to specify UNC paths that start with only a single
backslash (e.g. `\??\UNC\host\share`); All of them do start like
that, though, so let's use that as a tell-tale that we should skip
the auto-detection in `process_phantom_symlink()`. The symlink is
then left as a file symlink (the `mklink` default), and a warning is
emitted suggesting the user set the `symlink` gitattribute to `dir`
if a directory symlink is needed.  When the attribute is already set,
auto-detection is never invoked in the first place, so that code path
is unaffected.

This is the same class of vulnerability as CVE-2025-66413
(https://github.com/git-for-windows/git/security/advisories/GHSA-hv9c-4jm9-jh3x)
and follows the same general mitigation pattern that MinTTY adopted for
ANSI escape sequences referencing network share paths
(https://github.com/mintty/mintty/security/advisories/GHSA-jf4m-m6rv-p6c5).

Note that there are legitimate paths starting with a single backslash
that are _not_ network paths: drive-less absolute paths are interpreted
as relative to the current working directory's drive. In practice, these
are highly uncommon (and brittle, just one working directory change
away from breaking). In any case, the only consequence is now that the
symlink type of those has to be specified via Git attributes, is all.

Reported-by: Justin Lee <jessdhoctor@gmail.com>
Addresses: CVE-2026-32631
Addresses: https://github.com/git-for-windows/git/security/advisories/GHSA-9j5h-h4m7-85hx
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:50 +00:00
Junio C Hamano
faff5b2eb1 Merge branch 'ps/libgit-in-subdir' into seen
The source files for libgit.a have been moved into a new "lib/"
directory to clean up the top-level directory and clearly separate
library code.

* ps/libgit-in-subdir:
  Move libgit.a sources into separate "lib/" directory
  t/helper: prepare "test-example-tap.c" for introduction of "lib/"
2026-06-25 19:51:57 -07:00
Patrick Steinhardt
9759608622 Move libgit.a sources into separate "lib/" directory
The Git project is not exactly the easiest project to get started in:
it's written in C and POSIX shell, with bits of Perl, Rust and other
languages sprinkled into it. On top of that, the project has grown
somewhat organically over time, making the codebase hard to navigate.

These are problems that we're aware of, and there have been and still
are efforts to clean up some of the technical debt that is natural to
exist an a project that is more than 20 years old. Furthermore, we
provide resources to newcomers that help them out like our coding
guidelines, code of conduct or "MyFirstContribution.adoc".

But there is a rather practical problem: finding your way around in our
project's tree is not easy. Doing a directory listing in the top-level
directory will present you with more than 550 files, which makes it
extremely hard for a newcomer to figure out what files they are even
supposed to look at. This makes the onboarding experience somewhat
harder than it really needs to be. This isn't only a problem for
newcomers though, as I myself struggle to find the files I am looking
for because of the sheer number of files.

Besides the problem of discoverability it also creates a problem of
structure. It is not obvious at all which files are part of "libgit.a"
and which files are only linked into our final executables. So while we
have this split in our build systems, that split is not evident at all
in our tree.

Introduce a new "lib/" directory and move all of our sources for
"libgit.a" into it to fix these issues. It makes the split we have
evident and reduces the number of files in our top-level tree from 550
files to ~80 files.

This is still a lot of files, but it's significantly easier to navigate
already. Furthermore, we can further iterate after this step and think
about introducing a better structure for remaining files, as well.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-22 10:58:23 -07:00