Commit Graph

13981 Commits

Author SHA1 Message Date
Ben Peart
fe1bb7b31c status: disable and free fscache at the end of the status command
At the end of the status command, disable and free the fscache so that we
don't leak the memory and so that we can dump the fscache statistics.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-20 02:56:46 +00:00
Takuto Ikuta
fafbad7baa checkout.c: enable fscache for checkout again
This is retry of #1419.

I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.

git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.

Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333

Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136

I confirmed this patch passed all tests in t/ with core_fscache=1.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2026-06-20 02:56:46 +00:00
Jeff Hostetler
82963f19f2 add: use preload-index and fscache for performance
Teach "add" to use preload-index and fscache features
to improve performance on very large repositories.

During an "add", a call is made to run_diff_files()
which calls check_remove() for each index-entry.  This
calls lstat().  On Windows, the fscache code intercepts
the lstat() calls and builds a private cache using the
FindFirst/FindNext routines, which are much faster.

Somewhat independent of this, is the preload-index code
which distributes some of the start-up costs across
multiple threads.

We need to keep the call to read_cache() before parsing the
pathspecs (and hence cannot use the pathspecs to limit any preload)
because parse_pathspec() is using the index to determine whether a
pathspec is, in fact, in a submodule. If we would not read the index
first, parse_pathspec() would not error out on a path that is inside
a submodule, and t7400-submodule-basic.sh would fail with

	not ok 47 - do not add files from a submodule

We still want the nice preload performance boost, though, so we simply
call read_cache_preload(&pathspecs) after parsing the pathspecs.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:46 +00:00
Karsten Blees
4440391618 mingw: add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2026-06-20 02:56:46 +00:00
Johannes Schindelin
2533121ae9 Merge branch 'dont-clean-junctions'
This topic branch teaches `git clean` to respect NTFS junctions and Unix
bind mounts: it will now stop at those boundaries.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:46 +00:00
Johannes Schindelin
8e62ef0985 Merge pull request #1897 from piscisaureus/symlink-attr
Specify symlink type in .gitattributes
2026-06-20 02:56:46 +00:00
Johannes Schindelin
bb166173e1 credential-cache: handle ECONNREFUSED gracefully (#5329)
I should probably add some tests for this.
2026-06-20 02:56:45 +00:00
Johannes Schindelin
007fde5688 clean: remove mount points when possible
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:44 +00:00
Matthias Aßhauer
197c72decb credential-cache: handle ECONNREFUSED gracefully
In 245670c (credential-cache: check for windows specific errors, 2021-09-14)
we concluded that on Windows we would always encounter ENETDOWN where we
would expect ECONNREFUSED on POSIX systems, when connecting to unix sockets.
As reported in [1], we do encounter ECONNREFUSED on Windows if the
socket file doesn't exist, but the containing directory does and ENETDOWN if
neither exists. We should handle this case like we do on non-windows systems.

[1] https://github.com/git-for-windows/git/pull/4762#issuecomment-2545498245

This fixes https://github.com/git-for-windows/git/issues/5314

Helped-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:44 +00:00
Johannes Schindelin
897bd48ddc clean: do not traverse mount points
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.

In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".

Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).

This addresses https://github.com/git-for-windows/git/issues/607

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:44 +00:00
Johannes Schindelin
e9d0e58a94 Introduce helper to create symlinks that knows about index_state
On Windows, symbolic links actually have a type depending on the target:
it can be a file or a directory.

In certain circumstances, this poses problems, e.g. when a symbolic link
is supposed to point into a submodule that is not checked out, so there
is no way for Git to auto-detect the type.

To help with that, we will add support over the course of the next
commits to specify that symlink type via the Git attributes. This
requires an index_state, though, something that Git for Windows'
`symlink()` replacement cannot know about because the function signature
is defined by the POSIX standard and not ours to change.

So let's introduce a helper function to create symbolic links that
*does* know about the index_state.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:44 +00:00
Johannes Schindelin
a67767b77b survey: clearly note the experimental nature in the output
While this command is definitely something we _want_, chances are that
upstreaming this will require substantial changes.

We still want to be able to experiment with this before that, to focus
on what we need out of this command: To assist with diagnosing issues
with large repositories, as well as to help monitoring the growth and
the associated painpoints of such repositories.

To that end, we are about to integrate this command into
`microsoft/git`, to get the tool into the hands of users who need it
most, with the idea to iterate in close collaboration between these
users and the developers familar with Git's internals.

However, we will definitely want to avoid letting anybody have the
impression that this command, its exact inner workings, as well as its
output format, are anywhere close to stable. To make that fact utterly
clear (and thereby protect the freedom to iterate and innovate freely
before upstreaming the command), let's mark its output as experimental
in all-caps, as the first thing we do.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:43 +00:00
Derrick Stolee
d94fb94b06 survey: add --top=<N> option and config
The 'git survey' builtin provides several detail tables, such as "top
files by on-disk size". The size of these tables defaults to 10,
currently.

Allow the user to specify this number via a new --top=<N> option or the
new survey.top config key.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:43 +00:00
Derrick Stolee
21ab818062 survey: add report of "largest" paths
Since we are already walking our reachable objects using the path-walk API,
let's now collect lists of the paths that contribute most to different
metrics. Specifically, we care about

 * Number of versions.
 * Total size on disk.
 * Total inflated size (no delta or zlib compression).

This information can be critical to discovering which parts of the
repository are causing the most growth, especially on-disk size. Different
packing strategies might help compress data more efficiently, but the toal
inflated size is a representation of the raw size of all snapshots of those
paths. Even when stored efficiently on disk, that size represents how much
information must be processed to complete a command such as 'git blame'.

The exact disk size seems to be not quite robust enough for testing, as
could be seen by the `linux-musl-meson` job consistently failing, possibly
because of zlib-ng deflates differently: t8100.4(git survey
(default)) was failing with a symptom like this:

   TOTAL OBJECT SIZES BY TYPE
   ===============================================
   Object Type | Count | Disk Size | Inflated Size
   ------------+-------+-----------+--------------
  -    Commits |    10 |      1523 |          2153
  +    Commits |    10 |      1528 |          2153
         Trees |    10 |       495 |          1706
         Blobs |    10 |       191 |           101
  -       Tags |     4 |       510 |           528
  +       Tags |     4 |       547 |           528

This means: the disk size is unlikely something we can verify robustly.
Since zlib-ng seems to increase the disk size of the tags from 528 to
547, we cannot even assume that the disk size is always smaller than the
inflated size. We will most likely want to either skip verifying the
disk size altogether, or go for some kind of fuzzy matching, say, by
replacing `s/ 1[45][0-9][0-9] / ~1.5k /` and `s/ [45][0-9][0-9] / ~½k /`
or something like that.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:43 +00:00
Derrick Stolee
8c51186349 survey: add ability to track prioritized lists
In future changes, we will make use of these methods. The intention is to
keep track of the top contributors according to some metric. We don't want
to store all of the entries and do a sort at the end, so track a
constant-size table and remove rows that get pushed out depending on the
chosen sorting algorithm.

Co-authored-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by; Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2026-06-20 02:56:43 +00:00
Derrick Stolee
c15f6c4fc5 survey: show progress during object walk
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2026-06-20 02:56:43 +00:00
Derrick Stolee
21a3bff5af survey: summarize total sizes by object type
Now that we have explored objects by count, we can expand that a bit more to
summarize the data for the on-disk and inflated size of those objects. This
information is helpful for diagnosing both why disk space (and perhaps
clone or fetch times) is growing but also why certain operations are slow
because the inflated size of the abstract objects that must be processed is
so large.

Note: zlib-ng is slightly more efficient even at those small sizes. Even
between zlib versions, there are slight differences in compression. To
accommodate for that in the tests, not the exact numbers but some rough
approximations are validated (the test should validate `git survey`,
after all, not zlib).

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-20 02:56:43 +00:00
Derrick Stolee
0ebaf6e717 survey: add object count summary
At the moment, nothing is obvious about the reason for the use of the
path-walk API, but this will become more prevelant in future iterations. For
now, use the path-walk API to sum up the counts of each kind of object.

For example, this is the reachable object summary output for my local repo:

REACHABLE OBJECT SUMMARY
========================
Object Type |  Count
------------+-------
       Tags |   1343
    Commits | 179344
      Trees | 314350
      Blobs | 184030

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2026-06-20 02:56:43 +00:00
Derrick Stolee
2c241cea93 survey: start pretty printing data in table form
When 'git survey' provides information to the user, this will be presented
in one of two formats: plaintext and JSON. The JSON implementation will be
delayed until the functionality is complete for the plaintext format.

The most important parts of the plaintext format are headers specifying the
different sections of the report and tables providing concreted data.

Create a custom table data structure that allows specifying a list of
strings for the row values. When printing the table, check each column for
the maximum width so we can create a table of the correct size from the
start.

The table structure is designed to be flexible to the different kinds of
output that will be implemented in future changes.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2026-06-20 02:56:43 +00:00
Jeff Hostetler
aa3e2841f8 survey: add command line opts to select references
By default we will scan all references in "refs/heads/", "refs/tags/"
and "refs/remotes/".

Add command line opts let the use ask for all refs or a subset of them
and to include a detached HEAD.

Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
2026-06-20 02:56:43 +00:00
Jeff Hostetler
ced86f45a7 survey: stub in new experimental 'git-survey' command
Start work on a new 'git survey' command to scan the repository
for monorepo performance and scaling problems.  The goal is to
measure the various known "dimensions of scale" and serve as a
foundation for adding additional measurements as we learn more
about Git monorepo scaling problems.

The initial goal is to complement the scanning and analysis performed
by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool.
It is hoped that by creating a builtin command, we may be able to take
advantage of internal Git data structures and code that is not
accessible from GO to gain further insight into potential scaling
problems.

Co-authored-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2026-06-20 02:56:43 +00:00
Junio C Hamano
dcf0c084e4 Merge branch 'ps/odb-source-packed' into next
The packed object source has been refactored into a proper struct
odb_source.

* ps/odb-source-packed:
  odb/source-packed: drop pointer to "files" parent source
  midx: refactor interfaces to work on "packed" source
  odb/source-packed: stub out remaining functions
  odb/source-packed: wire up `freshen_object()` callback
  odb/source-packed: wire up `find_abbrev_len()` callback
  odb/source-packed: wire up `count_objects()` callback
  odb/source-packed: wire up `for_each_object()` callback
  odb/source-packed: wire up `read_object_stream()` callback
  odb/source-packed: wire up `read_object_info()` callback
  packfile: use higher-level interface to implement `has_object_pack()`
  odb/source-packed: wire up `reprepare()` callback
  odb/source-packed: wire up `close()` callback
  odb/source-packed: start converting to a proper `struct odb_source`
  odb/source-packed: store pointer to "files" instead of generic source
  packfile: move packed source into "odb/" subsystem
  packfile: split out packfile list logic
  packfile: rename `struct packfile_store` to `odb_source_packed`
2026-06-19 09:49:47 -07:00
Junio C Hamano
2b3ac350e6 Merge branch 'js/objects-larger-than-4gb-on-windows-more' into next
* js/objects-larger-than-4gb-on-windows-more:
  odb: use size_t for object_info.sizep and the size APIs
  packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
  pack-objects: use size_t for in-core object sizes
  packfile: widen unpack_entry()'s size out-parameter to size_t
  pack-objects(check_pack_inflate()): use size_t instead of unsigned long
  patch-delta: use size_t for sizes
  compat/msvc: use _chsize_s for ftruncate
2026-06-18 11:32:08 -07:00
Junio C Hamano
43ed8b3969 Merge branch 'rs/cat-file-default-format-optim' into next
* rs/cat-file-default-format-optim:
  cat-file: speed up default format
2026-06-17 11:11:21 -07:00
Patrick Steinhardt
7fa8c61afe midx: refactor interfaces to work on "packed" source
Our interfaces used to interact with MIDXs all work on top of the
generic `struct odb_source`. This doesn't make much sense though: a MIDX
is strictly tied to the "packed" source, so passing in a generic source
gives the false sense that it may also work with a different type of
source.

Fix this conceptual weirdness and instead require the caller to pass in
a "packed" source explicitly. This also makes the next commit easier to
implement, where we drop the pointer to the "files" source in the
"packed" source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-17 05:00:01 -07:00
Patrick Steinhardt
7ed53cde28 odb/source-packed: wire up for_each_object() callback
Move `packfile_store_for_each_object()` and its associated helpers from
"packfile.c" into "odb/source-packed.c" and wire it up as the
`for_each_object()` callback of the "packed" source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-17 05:00:00 -07:00
Patrick Steinhardt
9ea4ef8586 odb/source-packed: wire up reprepare() callback
Move the logic to prepare and reprepare the "packed" source into
"odb/source-packed.c" and wire it up as the `reprepare()` callback.

Note that "preparing" a source is not yet generic. Eventually, it would
probably make sense to turn the existing `reprepare()` callback into a
`prepare()` callback with an optional flag to force re-preparing. But
this step will be handled in a separate patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-17 05:00:00 -07:00
Junio C Hamano
38918c4cfd Merge branch 'td/ls-files-pathspec-prefilter' into next
`git ls-files --modified` and `git ls-files --deleted` have been
optimized to filter with pathspec before calling lstat() when there is
only a single pathspec item, avoiding unnecessary filesystem access
for entries that will not be shown.

* td/ls-files-pathspec-prefilter:
  ls-files: filter pathspec before lstat
2026-06-15 10:36:22 -07:00
Junio C Hamano
d9a8b88d47 Merge branch 'ps/setup-drop-global-state' into next
Continuation of "setup.c" refactoring to drop remaining global state
(`git_work_tree_cfg`, `is_bare_repository_cfg`). The most notable
outcome is that `is_bare_repository()` has been updated to no longer
implicitly rely on `the_repository`.

* ps/setup-drop-global-state:
  treewide: drop USE_THE_REPOSITORY_VARIABLE
  environment: stop using `the_repository` in `is_bare_repository()`
  environment: split up concerns of `is_bare_repository_cfg`
  builtin/init: stop modifying `is_bare_repository_cfg`
  setup: remove global `git_work_tree_cfg` variable
  builtin/init: simplify logic to configure worktree
  builtin/init: stop modifying global `git_work_tree_cfg` variable
2026-06-15 10:36:22 -07:00
Junio C Hamano
1ae171f3b7 Merge branch 'td/describe-tag-iteration' into next
'git describe' has been taught to pass the 'refs/tags/' prefix down to
the ref iterator when '--all' is not requested, avoiding unnecessary
iteration over non-tag refs.

* td/describe-tag-iteration:
  describe: limit default ref iteration to tags
2026-06-15 10:36:21 -07:00
René Scharfe
800581cd44 cat-file: speed up default format
eb54a3391b (cat-file: skip expanding default format, 2022-03-15) added
special handling for the default batch format.  In the meantime it has
fallen behind the code path for handling arbitrary formats.  Bring it up
to speed by using the new and more efficient strbuf_add_oid_hex() and
strbuf_add_uint() instead of strbuf_addf():

Benchmark 1: ./git_main cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'
  Time (mean ± σ):      1.051 s ±  0.003 s    [User: 1.027 s, System: 0.023 s]
  Range (min … max):    1.049 s …  1.058 s    10 runs

Benchmark 2: ./git_main cat-file --batch-all-objects --batch-check='%(objectname)-%(objecttype)-%(objectsize)'
  Time (mean ± σ):      1.012 s ±  0.002 s    [User: 0.988 s, System: 0.023 s]
  Range (min … max):    1.010 s …  1.018 s    10 runs

Benchmark 3: ./git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'
  Time (mean ± σ):     979.0 ms ±   1.1 ms    [User: 954.1 ms, System: 23.2 ms]
  Range (min … max):   977.7 ms … 980.8 ms    10 runs

Summary
  ./git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)' ran
    1.03 ± 0.00 times faster than ./git_main cat-file --batch-all-objects --batch-check='%(objectname)-%(objecttype)-%(objectsize)'
    1.07 ± 0.00 times faster than ./git_main cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-15 08:10:27 -07:00
Johannes Schindelin
c6a4629e32 odb: use size_t for object_info.sizep and the size APIs
When `js/objects-larger-than-4gb-on-windows` widened the streaming,
index-pack and unpack-objects code paths, in the interest of keeping the
patches somewhat reasonably-sized, it left the public ODB API still
typed in `unsigned long`. In particular `struct object_info::sizep` and
the four wrappers built on top of it (`odb_read_object`,
`odb_read_object_peeled`, `odb_read_object_info`, `odb_pretend_object`)
still return the unpacked size through `unsigned long *`, so on Windows
`cat-file -s` and the `git add` / `git status` paths for a >4 GiB blob
silently cap at 4 GiB.

Widen the field and the four wrappers. The previous commits already
widened the `unpack_entry()` cascade and pack-objects' in-core size
accessors, so most of the cascade arrives here with no further work: the
temporary shims in `packed_object_info_with_index_pos()` and in
`unpack_entry()`'s delta-base recovery path go away, the two
`SET_SIZE(entry, cast_size_t_to_ulong(canonical_size))` calls in
`check_object()` and the matching one in `drop_reused_delta()` collapse
to plain `SET_SIZE`, and `oe_get_size_slow()`'s tail
`cast_size_t_to_ulong()` is gone too.

What remains narrow are the boundaries this series does not
intend to touch: the diff, blame, textconv and fast-import machinery.

Even so, this patch is unfortunately quite large.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-15 07:45:41 -07:00
Johannes Schindelin
188bac14f7 pack-objects: use size_t for in-core object sizes
`pack-objects` stores per-entry object sizes in either the 31-bit
`size_` member of the `struct object_entry` or, when the value does not
fit, the `pack->delta_size[]` spill array.  The accessors (`oe_size`,
`oe_delta_size`, `oe_get_size_slow`, `oe_size_*_than`) and the setters
(`oe_set_size`, `oe_set_delta_size`) used `unsigned long` for the spill
type, which on Windows means the spill silently caps at 4 GiB per entry.
That is what made `upload-pack` die with "object too large to read on
this platform" when serving the >4 GiB blob in `t5608` tests 5 and 6
when run with `GIT_TEST_CLONE_2GB`.

Widen them all to `size_t` (including `pack->delta_size`) and drop the
three `cast_size_t_to_ulong()` calls in `check_object()` that guarded
`in_pack_size`.  The two `SET_SIZE(entry, canonical_size)` calls in the
same function stay cast-free as before, since `canonical_size` is still
`unsigned long` until a later commit widens `object_info::sizep`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-15 07:45:41 -07:00
Johannes Schindelin
2d83cc3f84 packfile: widen unpack_entry()'s size out-parameter to size_t
The topic `js/objects-larger-than-4gb-on-windows` widened the streaming,
index-pack and unpack-objects paths to `size_t` but deliberately stopped
at the in-memory `unpack_entry()` cascade, which still hands back the
unpacked size through `unsigned long *`.  On Windows that boundary
truncates above 4 GiB because that data type is only 32 bits wide on
that platform.

Widen the code path. Except `packed_object_info_with_index_pos()`: It
cannot yet pass `oi->sizep` directly because the field is still
`unsigned long *`; bridge it with a `size_t` temporary that narrows
back, and let a later commit drop the bridge once the field is wide
too. `gfi_unpack_entry()` keeps its narrow signature because fast-import
tracks sizes through `unsigned long` everywhere it crosses subsystem
boundaries, keeping its signature allows the scope of this commit to be
somewhat reasonable, still.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-15 07:45:40 -07:00
Johannes Schindelin
1d43315b31 pack-objects(check_pack_inflate()): use size_t instead of unsigned long
`write_reuse_object()` learned to track its packed-object size as
`size_t` in 606c192380 (odb, packfile: use size_t for streaming
object sizes, 2026-05-08), but the comparison sink it feeds,
`check_pack_inflate()`, still takes the expected decompressed size
as `unsigned long`. The call site bridges the mismatch with
`cast_size_t_to_ulong()`, which on Windows turns a >4 GiB object
into an immediate die().

That function only uses `expect` once: as the right-hand side of a
`stream.total_out == expect` equality test against zlib's counter.
zlib's own `total_out` counter is `uLong` and is therefore still
32-bit-bound on Windows. Widening `expect` to `size_t` cannot fix that,
but it is a strict improvement nonetheless: instead of dying outright,
an oversized object now simply makes the equality fail and lets
`write_reuse_object()` fall back to `write_no_reuse_object()`, which
decompresses and re-deflates the content (and which the larger
pack-objects widening series targets separately).

Drop the `cast_size_t_to_ulong()` shim at the call site now that
the receiving parameter speaks the same type as `entry_size`.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-15 07:45:40 -07:00
Johannes Schindelin
33afe87338 patch-delta: use size_t for sizes
`patch_delta()` takes the source and delta sizes by value and writes
back the reconstructed target size through an `unsigned long *`.  That
datatype cannot represent a value that exceeds 4 GiB on systems where
`unsigned long` is 32-bit (notably 64-bit Windows builds), though, even
though the delta encoding itself, the on-disk layout, and the in-memory
buffers happily carry such sizes. A `size_t` companion to
`get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in
17fa077596 (delta, packfile: use size_t for delta header sizes,
2026-05-08) precisely so that `patch_delta()` could be widened without
changing the on-the-wire decoding helper's signature.

Widen `patch_delta()`'s three size parameters to `size_t` and switch
its internal use of `get_delta_hdr_size()` to the `_sz` variant.
Then propagate the wider type through the callers.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-15 07:45:40 -07:00
Junio C Hamano
ff1784217f Merge branch 'ak/typofixes'
Typofixes.

* ak/typofixes:
  doc: fix typos via codespell
2026-06-15 07:42:00 -07:00
Junio C Hamano
883a47ef64 Merge branch 'ob/more-repo-config-values'
Many core configuration variables have been migrated from global
variables into 'repo_config_values' to tie them to a specific
repository instance, avoiding cross-repository state leakage.

* ob/more-repo-config-values:
  environment: move "warn_on_object_refname_ambiguity" into `struct repo_config_values`
  environment: move "sparse_expect_files_outside_of_patterns" into `struct repo_config_values`
  environment: move "core_sparse_checkout_cone" into `struct repo_config_values`
  environment: move "precomposed_unicode" into `struct repo_config_values`
  environment: move "pack_compression_level" into `struct repo_config_values`
  environment: move `zlib_compression_level` into `struct repo_config_values`
  environment: move "check_stat" into `struct repo_config_values`
  environment: move "trust_ctime" into `struct repo_config_values`
2026-06-15 07:42:00 -07:00
Junio C Hamano
cfe6682042 Merge branch 'hn/config-typo-advice'
"git config foo.bar=baz" is not likely to be a request to read the
value of such a variable with '=' in its name; rather it is plausible
that the user meant "git config set foo.bar baz".  Give advice when
giving an error message.

* hn/config-typo-advice:
  config: improve diagnostic for "set" with missing value
  config: add git_config_key_is_valid() for quiet validation
2026-06-15 07:41:59 -07:00
Tamir Duberstein
3f5203eeb4 ls-files: filter pathspec before lstat
In --deleted and --modified modes, show_files() calls lstat() for each
index entry before show_ce() applies the pathspec. prune_index() avoids
most of these calls for pathspecs with a common directory prefix, but
not for a top-level name or leading wildcard.

Match before lstat() to avoid accessing the worktree for entries that
cannot be shown. Treat this as a prefilter: do not update ps_matched,
and retain the match in show_ce() so --error-unmatch is satisfied only
by entries that the selected modes actually show.

Prefilter only a single pathspec item, bounding the added work for each
index entry. Applying match_pathspec() to multiple arguments can cost
more than the lstat() calls it avoids. In a synthetic repository with
10,000 clean files, passing every path to ls-files --modified increased
runtime from 112.5 ms to 494.1 ms when the prefilter was unconditional.

With $parent and $this exported as paths to binaries built from the
parent and this commit, on a repository with 881,290 index entries:

    hyperfine --warmup 0 --runs 3 \
        --command-name parent \
        '$parent -c core.fsmonitor=false ls-files --deleted -- README.md >/dev/null' \
        --command-name this-commit \
        '$this -c core.fsmonitor=false ls-files --deleted -- README.md >/dev/null'

reported means of 65.790 seconds for the parent and 4.987 seconds for
this commit.

Link: https://lore.kernel.org/r/xmqqfr2tnfk0.fsf@gitster.g
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-12 12:47:21 -07:00
Junio C Hamano
625f76ac4c Merge branch 'ab/index-pack-retain-child-bases' into next
"git index-pack" has been optimized by retaining child bases in the
delta cache instead of immediately freeing them, letting the existing
cache limit policy decide eviction.

* ab/index-pack-retain-child-bases:
  index-pack: retain child bases in delta cache
2026-06-12 05:46:58 -07:00
Junio C Hamano
a95871538b Merge branch 'jk/describe-contains-all-match-fix' into next
The 'git describe --contains --all' command has been fixed to
properly honor the '--match' and '--exclude' options by passing
them down to 'git name-rev' with the appropriate reference
prefixes.

* jk/describe-contains-all-match-fix:
  describe: fix --exclude, --match with --contains and --all
2026-06-11 05:20:06 -07:00
Junio C Hamano
1466219fc9 Merge branch 'kk/streaming-walk-pqueue' into next
Streaming revision walks have been optimized by using a priority queue
for date-sorting commits, speeding up walks repositories with many
merges.

* kk/streaming-walk-pqueue:
  revision: use priority queue for non-limited streaming walks
  revision: introduce rev_walk_mode to clarify get_revision_1()
  pack-objects: call release_revisions() after cruft traversal
2026-06-11 05:20:06 -07:00
Patrick Steinhardt
1ceee7431b treewide: drop USE_THE_REPOSITORY_VARIABLE
Adapt a couple of trivial callers of `is_bare_repository()` to instead
use a repository available via the caller's context so that we can drop
the `USE_THE_REPOSITORY_VARIABLE` macro.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-11 05:05:54 -07:00
Patrick Steinhardt
2e1d55626f environment: stop using the_repository in is_bare_repository()
Refactor `is_bare_repository()` to take in a repository parameter so
that we no longer depend on `the_repository`. Adjust callers
accordingly.

Furthermore, move the function outside of the declarations that are only
available when `USE_THE_REPOSITORY_VARIABLE` is set, as it no longer
depends on that variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-11 05:05:54 -07:00
Patrick Steinhardt
7ff3a5895b environment: split up concerns of is_bare_repository_cfg
The `is_bare_repository_cfg` variable tracks two different pieces of
information:

  - It tracks whether the user has invoked git with the "--bare" flag,
    which makes us treat any discovered Git repository as if it was a
    bare repository.

  - Otherwise it tracks whether the discovered `the_repository` is bare.

This makes the flag extremely confusing and creates a bit of a challenge
when handling multiple repositories in the same process.

Split up the concerns of this variable into two pieces:

  - `startup_info.force_bare_repository` tracks whether the user has
    passed the "--bare" flag. This is used as a hint to treat newly set
    up repositories as bare regardless of whether or not they have a
    worktree.

  - `struct repository::bare_cfg` tracks whether or not a repository is
    considered bare. This takes into account both whether the user has
    passed "--bare" and the discovered state of the repository itself.

Whether or not a repository is bare is now resolved when checking the
repository's format, and is then later applied to the repository itself
via `apply_repository_format()`.

This enables a subsequent change where we make `is_bare_repository()`
not depend on global state anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-11 05:05:54 -07:00
Patrick Steinhardt
f12d73132e builtin/init: stop modifying is_bare_repository_cfg
We're modifying `is_bare_repository_cfg` in "builtin/init.c" to indicate
whether the newly created repository is supposed to be a bare repository
or not.

This is ultimately unnecessary though: when initializing the repository
in `init_db()` we eventually set `is_bare_repository_cfg = !work_tree`,
so all that matters is whether or not we have a working tree configured,
and the working tree is set up in the non-bare in "builtin/init.c".

Stop modifying the global variable in "builtin/init.c" in favor of a
local variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-11 05:05:54 -07:00
Patrick Steinhardt
65eb5b989a builtin/init: simplify logic to configure worktree
In the preceding commit we have stopped modifying the global
`git_work_tree_cfg` variable. With this change there's now some code
paths where we end up setting the local `git_work_tree_cfg` variable,
but without actually using the value for anything.

Refactor the code a bit so that we only set the worktree configuration
in case it's actually needed. Furthermore, reflow it a bit to make the
code easier to follow.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-11 05:05:54 -07:00
Patrick Steinhardt
9bef2cf331 builtin/init: stop modifying global git_work_tree_cfg variable
When executing git-init(1) we need to figure out the final location of
the worktree. This location can be configured in a couple of ways: via
an environment variable, via the preexisting "core.worktree" config in
case we're reinitializing, or implicitly when reinitializing a non-bare
repository.

When checking for the worktree location in "builtin/init-db.c" we
populate any potentially-discovered value both by setting the global
`git_work_tree_cfg` variable and via `set_git_work_tree()`, which
ultimately ends up modifying `struct repository::worktree`.

Modifying `git_work_tree_cfg` is unnecessary though: we configure the
worktree in `create_default_files()`, and that function derives the
worktree location via `repo_get_work_tree()`. Consequently, propagating
the worktree via `set_git_work_tree()` is sufficient.

Stop munging `git_work_tree_cfg` and make it file-local to "setup.c" and
function-local to `cmd_init_db()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-11 05:05:54 -07:00
Junio C Hamano
06f63df846 Merge branch 'ps/odb-source-loose'
The loose object source has been refactored into a proper `struct
odb_source`.

* ps/odb-source-loose:
  odb/source-loose: drop pointer to the "files" source
  odb/source-loose: stub out remaining callbacks
  odb/source-loose: wire up `write_object_stream()` callback
  object-file: refactor writing objects to use loose source
  odb/source-loose: wire up `write_object()` callback
  loose: refactor object map to operate on `struct odb_source_loose`
  odb/source-loose: wire up `freshen_object()` callback
  odb/source-loose: drop `odb_source_loose_has_object()`
  odb/source-loose: wire up `count_objects()` callback
  odb/source-loose: wire up `find_abbrev_len()` callback
  odb/source-loose: wire up `for_each_object()` callback
  odb/source-loose: wire up `read_object_stream()` callback
  odb/source-loose: wire up `read_object_info()` callback
  odb/source-loose: wire up `close()` callback
  odb/source-loose: wire up `reprepare()` callback
  odb/source-loose: start converting to a proper `struct odb_source`
  odb/source-loose: store pointer to "files" instead of generic source
  odb/source-loose: move loose source into "odb/" subsystem
2026-06-11 04:31:18 -07:00