Commit Graph

20 Commits

Author SHA1 Message Date
Ben Peart
872326dcca fscache: update fscache to be thread specific instead of global
The threading model for fscache has been to have a single, global cache.
This puts requirements on it to be thread safe so that callers like
preload-index can call it from multiple threads.  This was implemented
with a single mutex and completion events which introduces contention
between the calling threads.

Simplify the threading model by making fscache thread specific.  This allows
us to remove the global mutex and synchronization events entirely and instead
associate a fscache with every thread that requests one. This works well with
the current multi-threading which divides the cache entries into blocks with
a separate thread processing each block.

At the end of each worker thread, if there is a fscache on the primary
thread, merge the cached results from the worker into the primary thread
cache. This enables us to reuse the cache later especially when scanning for
untracked files.

In testing, this reduced the time spent in preload_index() by about 25% and
also reduced the CPU utilization significantly.  On a repo with ~200K files,
it reduced overall status times by ~12%.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-26 08:58:04 +00:00
Ben Peart
550881dc74 fscache: fscache takes an initial size
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.

Add a separate disable_fscache() macro to make the code clearer and easier
to read.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:58:04 +00:00
Takuto Ikuta
bca9b4e589 checkout.c: enable fscache for checkout again
This is retry of #1419.

I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.

git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.

Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333

Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136

I confirmed this patch passed all tests in t/ with core_fscache=1.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2026-06-26 08:58:04 +00:00
Jeff Hostetler
89b2c46d40 fscache: make fscache_enabled() public
Make fscache_enabled() function public rather than static.
Remove unneeded fscache_is_enabled() function.
Change is_fscache_enabled() macro to call fscache_enabled().

is_fscache_enabled() now takes a pathname so that the answer
is more precise and mean "is fscache enabled for this pathname",
since fscache only stores repo-relative paths and not absolute
paths, we can avoid attempting lookups for absolute paths.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2026-06-26 08:58:04 +00:00
Jeff Hostetler
90a656382d dir.c: make add_excludes aware of fscache during status
Teach read_directory_recursive() and add_excludes() to
be aware of optional fscache and avoid trying to open()
and fstat() non-existant ".gitignore" files in every
directory in the worktree.

The current code in add_excludes() calls open() and then
fstat() for a ".gitignore" file in each directory present
in the worktree.  Change that when fscache is enabled to
call lstat() first and if present, call open().

This seems backwards because both lstat needs to do more
work than fstat.  But when fscache is enabled, fscache will
already know if the .gitignore file exists and can completely
avoid the IO calls.  This works because of the lstat diversion
to mingw_lstat when fscache is enabled.

This reduced status times on a 350K file enlistment of the
Windows repo on a NVMe SSD by 0.25 seconds.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2026-06-26 08:58:04 +00:00
Karsten Blees
7db1c9d81f mingw: add a cache below mingw's lstat and dirent implementations
Checking the work tree status is quite slow on Windows, due to slow
`lstat()` emulation (git calls `lstat()` once for each file in the
index). Windows operating system APIs seem to be much better at scanning
the status of entire directories than checking single files.

Add an `lstat()` implementation that uses a cache for lstat data. Cache
misses read the entire parent directory and add it to the cache.
Subsequent `lstat()` calls for the same directory are served directly
from the cache.

Also implement `opendir()`/`readdir()`/`closedir()` so that they create
and use directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:58:04 +00:00
Karsten Blees
6cabec8b8c mingw: add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2026-06-26 08:58:04 +00:00
Johannes Schindelin
f028e7203e Continue improving support for 4GB+ packs/clones/objects (#6289)
This PR contains a branch thicket on top of v2.55.0-rc1 (i.e. ready to
go upstream) to continue the bulk of the `unsigned long` -> `size_t`
transformation.

Since all of these changes have no impact on the currently-working
functionality for <4GB objects/packs/clones (modulo bugs, that is 😄), I
would like to merge this before v2.55.0-rc2, still: The risk of
introducing a regression is negligible, the chance for fixing the
majority of problems with large clones is high.
2026-06-26 08:58:03 +00:00
Johannes Schindelin
442d369390 Merge branch 'dont-clean-junctions'
This topic branch teaches `git clean` to respect NTFS junctions and Unix
bind mounts: it will now stop at those boundaries.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:58:02 +00:00
Johannes Schindelin
3ad6222635 Merge pull request #1897 from piscisaureus/symlink-attr
Specify symlink type in .gitattributes
2026-06-26 08:58:02 +00:00
Johannes Schindelin
812c9e4761 Merge branch 'run-command-be-helpful-when-Git-LFS-fails-on-Windows-7'
Since Git LFS v3.5.x implicitly dropped Windows 7 support, we now want
users to be advised _what_ is going wrong on that Windows version. This
topic branch goes out of its way to provide users with such guidance.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:58:00 +00:00
Johannes Schindelin
6a686af23b Add full mingw-w64-git (i.e. regular MSYS2 ecosystem) support (#5971)
Every once in a while, there are bug reports in Git for Windows' bug
tracker that describe an issue running [inside MSYS2
proper](https://gitforwindows.org/install-inside-msys2-proper), totally
ignoring the big, honking warning on top of [the
page](https://gitforwindows.org/install-inside-msys2-proper) that spells
out clearly that this is an unsupported use case.

At the same time, we cannot easily deflect and say "just use MSYS2
directly" (and leave the "and stop pestering us" out). We cannot do that
because there is only an _MSYS_ `git` package in MSYS2 (i.e. a Git that
uses the quite slow POSIX emulation layer provided by the MSYS2
runtime), but no `mingw-w64-git` package (which would be equivalent in
speed to Git for Windows).

In https://github.com/msys2/MINGW-packages/pull/26470, I am preparing to
change that. As part of that PR, I noticed and fixed a couple of issues
_in `git-for-windows/git` that prevented full support for
`mingw-w64-git` in MSYS2, such as problems with CLANG64 and UCRT64.

While at it, I simplified the entire setup to trust MSYS2's
`MINGW_PREFIX` & related environment variables instead of hard-coding
values like the installation prefix and what `MSYSTEM` to fall back on
if it is unset.
2026-06-26 08:57:59 +00:00
Johannes Schindelin
32bd79b093 Drop the cast_size_t_to_ulong() helper
Now that all of the call sites of this helper (which I used as a kind of
"NEEDSWORK" marker) are eliminated, we can drop that helper altogether.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:55 +00:00
Johannes Schindelin
e535a93014 clean: do not traverse mount points
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.

In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".

Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).

This addresses https://github.com/git-for-windows/git/issues/607

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
52da019eaa Introduce helper to create symlinks that knows about index_state
On Windows, symbolic links actually have a type depending on the target:
it can be a file or a directory.

In certain circumstances, this poses problems, e.g. when a symbolic link
is supposed to point into a submodule that is not checked out, so there
is no way for Git to auto-detect the type.

To help with that, we will add support over the course of the next
commits to specify that symlink type via the Git attributes. This
requires an index_state, though, something that Git for Windows'
`symlink()` replacement cannot know about because the function signature
is defined by the POSIX standard and not ours to change.

So let's introduce a helper function to create symbolic links that
*does* know about the index_state.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:53 +00:00
Johannes Schindelin
60dfd66a2f run-command: be helpful with Git LFS fails on Windows 7
Git LFS is now built with Go 1.21 which no longer supports Windows 7.
However, Git for Windows still wants to support Windows 7.

Ideally, Git LFS would re-introduce Windows 7 support until Git for
Windows drops support for Windows 7, but that's not going to happen:
https://github.com/git-for-windows/git/issues/4996#issuecomment-2176152565

The next best thing we can do is to let the users know what is
happening, and how to get out of their fix, at least.

This is not quite as easy as it would first seem because programs
compiled with Go 1.21 or newer will simply throw an exception and fail
with an Access Violation on Windows 7.

The only way I found to address this is to replicate the logic from Go's
very own `version` command (which can determine the Go version with
which a given executable was built) to detect the situation, and in that
case offer a helpful error message.

This addresses https://github.com/git-for-windows/git/issues/4996.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:52 +00:00
Johannes Schindelin
79496ee9c7 max_tree_depth: lower it for clang builds in general on Windows
In 436a42215e (max_tree_depth: lower it for clangarm64 on Windows,
2025-04-23), I provided a work-around for a nasty issue with clangarm
builds, where the stack is exhausted before the maximal tree depth is
reached, and the resulting error cannot easily be handled by Git
(because it would require Windows-specific handling).

Turns out that this is not at all limited to ARM64. In my tests with
CLANG64 in MSYS2 on the GitHub Actions runners, the test t6700.4 failed
in the exact same way. What's worse: The limit needs to be quite a bit
lower for x86_64 than for aarch64. In aforementioned tests, the breaking
point was 1232: With 1231 it still worked as expected, with 1232 it
would fail with the `STATUS_STACK_OVERFLOW` incorrectly mapped to exit
code 127. For comparison, in my tests on GitHub Actions' Windows/ARM64
runners, the breaking point was 1439 instead.

Therefore the condition needs to be adapted once more, to accommodate
(with some safety margin) both aarch64 and x86_64 in clang-based builds
on Windows, to let that test pass.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:52 +00:00
Johannes Schindelin
341f34f606 strbuf_realpath(): use platform-dependent API if available
Some platforms (e.g. Windows) provide API functions to resolve paths
much quicker. Let's offer a way to short-cut `strbuf_realpath()` on
those platforms.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-26 08:57:51 +00:00
Junio C Hamano
faff5b2eb1 Merge branch 'ps/libgit-in-subdir' into seen
The source files for libgit.a have been moved into a new "lib/"
directory to clean up the top-level directory and clearly separate
library code.

* ps/libgit-in-subdir:
  Move libgit.a sources into separate "lib/" directory
  t/helper: prepare "test-example-tap.c" for introduction of "lib/"
2026-06-25 19:51:57 -07:00
Patrick Steinhardt
9759608622 Move libgit.a sources into separate "lib/" directory
The Git project is not exactly the easiest project to get started in:
it's written in C and POSIX shell, with bits of Perl, Rust and other
languages sprinkled into it. On top of that, the project has grown
somewhat organically over time, making the codebase hard to navigate.

These are problems that we're aware of, and there have been and still
are efforts to clean up some of the technical debt that is natural to
exist an a project that is more than 20 years old. Furthermore, we
provide resources to newcomers that help them out like our coding
guidelines, code of conduct or "MyFirstContribution.adoc".

But there is a rather practical problem: finding your way around in our
project's tree is not easy. Doing a directory listing in the top-level
directory will present you with more than 550 files, which makes it
extremely hard for a newcomer to figure out what files they are even
supposed to look at. This makes the onboarding experience somewhat
harder than it really needs to be. This isn't only a problem for
newcomers though, as I myself struggle to find the files I am looking
for because of the sheer number of files.

Besides the problem of discoverability it also creates a problem of
structure. It is not obvious at all which files are part of "libgit.a"
and which files are only linked into our final executables. So while we
have this split in our build systems, that split is not evident at all
in our tree.

Introduce a new "lib/" directory and move all of our sources for
"libgit.a" into it to fix these issues. It makes the split we have
evident and reduces the number of files in our top-level tree from 550
files to ~80 files.

This is still a lot of files, but it's significantly easier to navigate
already. Furthermore, we can further iterate after this step and think
about introducing a better structure for remaining files, as well.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-06-22 10:58:23 -07:00