This commit continue the migration from `unsigned long` to `size_t`,
converting `grep_buffer()` and helpers. The callers are already prepared
for this change.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prep for the upcoming blame_scoreboard.final_buf_size widening:
prepare_lines() will pass a size_t through to find_line_starts(),
and the other caller (fill_origin_blob() via o->file.size) already
goes through long->size_t promotion. The function is file-static
and only uses len as a loop bound.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the migration from `unsigned long` to `size_t`. The `size`
attribute of `struct commit_buffer` is fed either from
`odb_read_object()`'s return value (`size_t`, handled with
`cast_size_t_to_ulong()`) or from `strbuf.len` in
`fake_working_tree_commit()` (silently narrowed today). Widen the field
and a couple of function signatures together, drop the shim in
`repo_get_commit_buffer()`, and move the matching `unsigned long` locals
at the in-tree callers in commit.c (three sites), builtin/replace.c, and
builtin/stash.c (two sites). The remaining callers pass NULL or already
pass a size_t-compatible variable.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Final piece of the tree topic. struct tree.size already receives
its values from size_t-shaped sources (odb_read_object() in
repo_parse_tree_gently() and in reflog.c::tree_is_complete()),
so on Windows it was already silently truncating anything past
4 GiB. Switch the field and parse_tree_buffer()'s size parameter
to size_t.
All readers feed tree->size into init_tree_desc(), which was
widened earlier in this topic; the existing parse_object_buffer()
caller in object.c keeps its unsigned long parameter, which
promotes cleanly.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Last piece of the delta API to still expose unsigned long. The
function literally returns struct delta_index.memsize, which became
size_t in the first commit of this series. The sole caller
(free_unpacked() in builtin/pack-objects.c) already accepts size_t
via its freed_mem local, so the widening only removes the implicit
size_t -> unsigned long narrowing inside the function body.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. The struct field already receives
its writes from a size_t-shaped source (xsize_t(st.st_size),
strbuf.len, fill_textconv()'s return, odb_read_object_info_extended()
via oi.sizep), so on Windows it was already truncating anything
past 4 GiB silently on the strbuf and textconv paths and loudly
through cast_size_t_to_ulong() on the odb path. Switch the field
to size_t.
In diff_populate_filespec(), point oi.sizep at the field directly
and drop both cast_size_t_to_ulong() shims and the size_st bridge
they fed.
Downstream consumers that still read .size into unsigned long
locals will now silently narrow on Windows where the field exceeds
4 GiB. Each of those is its own follow-up; the writer side is the
prerequisite for ever putting a >4 GiB value in the field in the
first place.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Same theme as the preceding pack-objects series: get_size_by_pos()
returns an unsigned long but reads its size out of packed_object_info()
/ odb_read_object_info_extended() via a size_t out-parameter, so on
Windows it would silently truncate the very sizes filter_bitmap_blob_limit()
then compares against the --filter=blob:limit threshold to decide which
blobs to elide from the bitmap-backed traversal. Drop the
cast_size_t_to_ulong() and return size_t directly.
The two callers' limit comparison promotes to size_t cleanly. limit
itself stays unsigned long; it is part of a filter API ripple of its
own.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
All four `unsigned long` / `int` / `ssize_t` receivers across
archive-zip, diff, http-push and t/helper/test-pack-deltas were
widened to size_t in the prior commits, and remote-curl and
fast-import were already there. With every caller prepared, both the
parameter and the return type can now move without introducing any
silent narrowing.
For inputs above zlib's uLong range (i.e. >4 GiB on platforms where
uLong is 32-bit, notably 64-bit Windows), defer to zlib's
stored-block formula (the same fallback it would itself use for an
unknown stream state) plus the worst-case wrapper overhead. The
existing path through deflateBound() is unchanged for inputs that
fit.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. textconv_object() fills its
out-parameter from fill_textconv()'s size_t return through an
unsigned long*; widen the API to match, then take advantage of the
new shape where callers can.
cat-file's 'c' and batch-mode 'c' branches lose their size_ul
bridge variables (one site becomes a direct call, the other
collapses an if/else into a single negated condition that reads as
"try textconv, fall back to a raw read").
blame.c likewise drops the file_size_st bridge in fill_origin_blob()
and hoists final_buf_size_st to bracket both branches in
setup_scoreboard(). The latter keeps a cast_size_t_to_ulong() shim
because struct blame_scoreboard.final_buf_size is still unsigned
long; that field is its own topic.
log.c just widens its local from unsigned long to size_t.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prep for dropping the cast_size_t_to_ulong() shim in
add_preferred_base() (pack-objects.c), and aligns the public API
with the size_t shape the rest of the tree topic is moving toward.
struct tree_desc.size stays unsigned int -- the on-disk tree format
hard-caps each tree at 4 GiB, so the field is intentionally narrow
and the assignment in init_tree_desc_internal() already truncated
unsigned long inputs the same way it now truncates size_t inputs.
The widening is purely about the call-side type-correctness; the
internal cap is unchanged.
All 30+ callers pass values that promote cleanly (unsigned long,
size_t, or smaller integer types).
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. With buffer_is_binary() widened
in the prior commit, every consumer that the size flows into in
combine-diff.c is size_t-ready, so widen grab_blob()'s out-param
outright and move the matching locals at its three call sites
together. grab_blob()'s body collapses to a direct
odb_read_object(&size) since the bridge variable is no longer
needed.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Smallest piece of the tree topic. link_len is only used as
strbuf_splice()'s size_t length and as an array index; widening it
outright removes the cast_size_t_to_ulong() shim and the bridge
local that fed it. odb_read_object() now writes straight into
link_len.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prep for the widenings of its callers, where size-receiving locals
will become size_t (combine-diff's result_size in the immediately
following commit, struct diff_filespec.size in a later topic). Body
caps the parameter at 8000 anyway, so the type change is mechanical.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Fixes a pre-existing silent narrowing from git_deflate_bound()'s
unsigned long return into an int local: anything past 2 GiB has
always wrapped negative here and then been re-extended to size_t
inside xmalloc(). Also prep for the upcoming git_deflate_bound()
widening to size_t, which would extend the narrowing further if
bound stayed int.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation. read_blob_data_from_index() reads
the blob through the size_t odb_read_object() API but writes the
size back through an unsigned long out-parameter, silently
truncating anything past 4 GiB on Windows. Widen the out-parameter,
drop the cast_size_t_to_ulong() shim, and move the matching locals
in the two convert.c callers and the one in attr.c. Their
downstream consumers (gather_convert_stats() widened in the prior
commit and read_attr_from_buf() already size_t) take the new type
directly.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prep for the upcoming git_deflate_bound() widening to size_t: the
local that catches its return needs to be size_t too, otherwise the
widening would introduce a silent Windows narrowing here. No
semantic effect with the current unsigned-long-returning
git_deflate_bound() (size_t == unsigned long on this caller's
platforms today).
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Prep for the upcoming read_blob_data_from_index() widening, whose
callers in convert.c feed the size they receive straight into these
two helpers. Both are file-static, so the change is contained.
Also fixes a small pre-existing narrowing on the get_wt_convert_stats_ascii()
path, where strbuf.len (size_t) was passed to a unsigned long
parameter.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Bundling the two widenings: four call sites pass &stream.avail_in
directly to use_pack(), and widening either type fencepost alone
would force a bridge variable at each. Doing both together is the
simpler end state and is the prerequisite for the do_compress()
widening in the next commit, which is what lets
write_no_reuse_object() lose its last cast_size_t_to_ulong() shim.
The unsigned-long locals widened at the other use_pack() callers
(avail / remaining / left) hold pack-window sizes bounded by
core.packedGitWindowSize, so the change is type consistency rather
than a new >4GB capability. git_zstream.avail_in / avail_out
likewise reach zlib's uInt fields only after zlib_buf_cap()'s 1 GiB
cap, so the wrapper already accepted size_t-shaped inputs in
practice.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue the size_t evacuation around large object handling: with
deflate_it() and the locals around it widened, the
cast_size_t_to_ulong() shim the prior delta_delta() widening had to
leave behind in emit_binary_diff_body() goes away. deflate_it() is
file-static; the only callers are the two in emit_binary_diff_body()
already touched here.
emit_diff_symbol() formats the resulting sizes via uintmax_t / %"PRIuMAX",
so the diff output is not affected; only the per-process upper bound
on a binary patch chunk that this function can address grows beyond
4 GiB on Windows.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Last stop in the delta-encoding API widening for >4 GiB blobs on
Windows: with create_delta_index() done in the prior commit and
create_delta()/diff_delta() finished here, every byte count that
crosses delta.h is now size_t. The struct fields they store into
have been size_t since the diff-delta struct widening.
The API change must move with all callers in the same commit (the
build only passes when every &delta_size matches the new size_t*).
Caller updates are kept minimal:
* builtin/pack-objects.c get_delta() and try_delta(): widen only
the local delta_size variable; the surrounding unsigned-long
locals and their cast_size_t_to_ulong() shims are out of scope
here and will be cleaned up in their own commits.
* builtin/fast-import.c, diff.c, t/helper/test-pack-deltas.c:
keep the local unsigned-long delta size (each feeds a still-
unsigned-long downstream consumer: zlib's avail_in,
deflate_it(), the test helper's own do_compress()), and bridge
via a temporary size_t plus cast_size_t_to_ulong(). The new
casts are paid back in later topics that widen those consumers.
* t/helper/test-delta.c: widen the local outright (no downstream
consumer beyond the test's own out_size, which is already
size_t).
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The sole caller (try_delta() in builtin/pack-objects.c) passes an
unsigned long, which promotes safely, so no caller fixups are
needed. Splitting it out keeps the diff_delta() / create_delta()
widening, which does ripple to several callers, in its own commit.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Preparation for widening the delta-encoding API to size_t in
subsequent commits, which is what lets pack-objects drop the
cast_size_t_to_ulong() shims that 606c192380 (odb, packfile: use
size_t for streaming object sizes, 2026-05-08) had to leave behind
in get_delta() and try_delta() because their downstream consumers
were still narrow.
The struct is private to diff-delta.c, so widening its fields in
isolation is a no-op at runtime: the values stored continue to fit
in 32 bits on Windows because the public API around it still
truncates. Splitting it out keeps the API-change commit focused on
caller updates.
Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These fixes have been sent to the Git mailing list but have not been
picked up by the Git project yet.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In some implementations, `regexec_buf()` assumes that it is fed lines;
Without `REG_NOTEOL` it thinks the end of the buffer is the end of a
line. Which makes sense, but trips up this case because we are not
feeding lines, but rather a whole buffer. So the final newline is not
the start of an empty line, but the true end of the buffer.
This causes an interesting bug:
$ echo content >file.txt
$ git grep --no-index -n '^$' file.txt
file.txt:2:
This bug is fixed by making the end of the buffer consistently the end
of the final line.
The patch was applied from
https://lore.kernel.org/git/20250113062601.GD767856@coredump.intra.peff.net/
Reported-by: Olly Betts <olly@survex.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When a Unix socket is initialized, the current directory's path is
stored so that the cleanup code can `chdir()` back to where it was
before exit.
If the path that needs to be stored exceeds the default size of the
`sun_path` attribute of `struct sockaddr_un` (which is defined as a
108-sized byte array on Linux), a larger buffer needs to be allocated so
that it can hold the path, and it is the responsibility of the
`unix_sockaddr_cleanup()` function to release that allocated memory.
In Git's CI, this stack allocation is not necessary because the code is
checked out to `/home/runner/work/git/git`. Concatenate the path
`t/trash directory.t0301-credential-cache/.cache/git/credential/socket`
and a terminating NUL, and you end up with 96 bytes, 12 shy of the
default `sun_path` size.
However, I use worktrees with slightly longer paths:
`/home/me/projects/git/yes/i/nest/worktrees/to/organize/them/` is more
in line with what I have. When I recently tried to locally reproduce a
failure of the `linux-leaks` CI job, this t0301 test failed (where it
had not failed in CI).
The reason: When `credential-cache` tries to reach its daemon initially
by calling `unix_sockaddr_init()`, it is expected that the daemon cannot
be reached (the idea is to spin up the daemon in that case and try
again). However, when this first call to `unix_sockaddr_init()` fails,
the code returns early from the `unix_stream_connect()` function
_without_ giving the cleanup code a chance to run, skipping the
deallocation of above-mentioned path.
The fix is easy: do not return early but instead go directly to the
cleanup code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, symbolic links come in two flavors: file symlinks and
directory symlinks. Since Git was born on Linux where this distinction
does not exist, Git for Windows has to auto-detect the type by looking
at the target. When the target does not yet exist at symlink creation
time, Git for Windows creates a "phantom" file symlink and later, once
checkout is complete, calls `CreateFileW()` on the target to check
whether it is actually a directory.
If the symlink target is a UNC path (e.g. `\\attacker\share`), this
auto-detection triggers an SMB connection to the remote host. Windows
performs NTLM authentication by default for such connections, which
means a crafted repository can exfiltrate the cloning user's NTLMv2
hash to an attacker-controlled server without any user interaction
beyond `git clone -c core.symlinks=true <url>`.
There are ways to specify UNC paths that start with only a single
backslash (e.g. `\??\UNC\host\share`); All of them do start like
that, though, so let's use that as a tell-tale that we should skip
the auto-detection in `process_phantom_symlink()`. The symlink is
then left as a file symlink (the `mklink` default), and a warning is
emitted suggesting the user set the `symlink` gitattribute to `dir`
if a directory symlink is needed. When the attribute is already set,
auto-detection is never invoked in the first place, so that code path
is unaffected.
This is the same class of vulnerability as CVE-2025-66413
(https://github.com/git-for-windows/git/security/advisories/GHSA-hv9c-4jm9-jh3x)
and follows the same general mitigation pattern that MinTTY adopted for
ANSI escape sequences referencing network share paths
(https://github.com/mintty/mintty/security/advisories/GHSA-jf4m-m6rv-p6c5).
Note that there are legitimate paths starting with a single backslash
that are _not_ network paths: drive-less absolute paths are interpreted
as relative to the current working directory's drive. In practice, these
are highly uncommon (and brittle, just one working directory change
away from breaking). In any case, the only consequence is now that the
symlink type of those has to be specified via Git attributes, is all.
Reported-by: Justin Lee <jessdhoctor@gmail.com>
Addresses: CVE-2026-32631
Addresses: https://github.com/git-for-windows/git/security/advisories/GHSA-9j5h-h4m7-85hx
Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The source files for libgit.a have been moved into a new "lib/"
directory to clean up the top-level directory and clearly separate
library code.
* ps/libgit-in-subdir:
Move libgit.a sources into separate "lib/" directory
t/helper: prepare "test-example-tap.c" for introduction of "lib/"
The Git project is not exactly the easiest project to get started in:
it's written in C and POSIX shell, with bits of Perl, Rust and other
languages sprinkled into it. On top of that, the project has grown
somewhat organically over time, making the codebase hard to navigate.
These are problems that we're aware of, and there have been and still
are efforts to clean up some of the technical debt that is natural to
exist an a project that is more than 20 years old. Furthermore, we
provide resources to newcomers that help them out like our coding
guidelines, code of conduct or "MyFirstContribution.adoc".
But there is a rather practical problem: finding your way around in our
project's tree is not easy. Doing a directory listing in the top-level
directory will present you with more than 550 files, which makes it
extremely hard for a newcomer to figure out what files they are even
supposed to look at. This makes the onboarding experience somewhat
harder than it really needs to be. This isn't only a problem for
newcomers though, as I myself struggle to find the files I am looking
for because of the sheer number of files.
Besides the problem of discoverability it also creates a problem of
structure. It is not obvious at all which files are part of "libgit.a"
and which files are only linked into our final executables. So while we
have this split in our build systems, that split is not evident at all
in our tree.
Introduce a new "lib/" directory and move all of our sources for
"libgit.a" into it to fix these issues. It makes the split we have
evident and reduces the number of files in our top-level tree from 550
files to ~80 files.
This is still a lot of files, but it's significantly easier to navigate
already. Furthermore, we can further iterate after this step and think
about introducing a better structure for remaining files, as well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>