Commit Graph

178575 Commits

Author SHA1 Message Date
Derrick Stolee
e5abfe06a5 CONTRIBUTING.md: add guide for first-time contributors
Getting started contributing to Git can be difficult on a Windows
machine. CONTRIBUTING.md contains a guide to getting started, including
detailed steps for setting up build tools, running tests, and
submitting patches to upstream.

[includes an example by Pratik Karki how to submit v2, v3, v4, etc.]

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
7568a4c625 Modify the Code of Conduct for Git for Windows
The Git project followed Git for Windows' lead and added their Code of
Conduct, based on the Contributor Covenant v1.4, later updated to v2.0.

We adapt it slightly to Git for Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
734f917547 Add an AGENTS.md file to help with AI-assisted debugging/development
In this time and age, AI is everywhere. However, it's sometimes not very
easy to use. For green-field projects it works quite a bit better than
for existing legacy projects. And Git's source code is _quite_ as legacy
code as they come... 😁

Now, the only way how AI can be used efficiently with legacy code
is by providing enough information by way of prompt context for the
AI to have a chance to make any sense of the code. The structure and
the architecture is, after all, not designed for AI, but rather the
opposite: By virtue of having grown organically over two decades, there
is no design that AI coding models would readily grasp.

So here is a document that describes all kinds of aspects about this
project. The idea is to help AI by providing information that it does
not have ingrained in its weights. The idea is to provide information
that a human prompter might take for granted, but no coding model will
have been trained on specifically.

Assisted-by: Claude Opus 4.5
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
e6ef1a9f1a Describe Git for Windows' architecture
The Git for Windows project has grown quite complex over the years,
certainly much more complex than during the first years where the
`msysgit.git` repository was abusing Git for package management purposes
and the `git/git` fork was called `4msysgit.git`.

Let's describe the status quo in a thorough way.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
987972dc6d t9200: skip tests when $PWD contains a colon
On Windows, the current working directory is pretty much guaranteed to
contain a colon. If we feed that path to CVS, it mistakes it for a
separator between host and port, though.

This has not been a problem so far because Git for Windows uses MSYS2's
Bash using a POSIX emulation layer that also pretends that the current
directory is a Unix path (at least as long as we're in a shell script).

However, that is rather limiting, as Git for Windows also explores other
ports of other Unix shells. One of those is BusyBox-w32's ash, which is
a native port (i.e. *not* using any POSIX emulation layer, and certainly
not emulating Unix paths).

So let's just detect if there is a colon in $PWD and punt in that case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
63cd918bfc t5813: allow for $PWD to be a Windows path
Git for Windows uses MSYS2's Bash to run the test suite, which comes
with benefits but also at a heavy price: on the plus side, MSYS2's
POSIX emulation layer allows us to continue pretending that we are on a
Unix system, e.g. use Unix paths instead of Windows ones, yet this is
bought at a rather noticeable performance penalty.

There *are* some more native ports of Unix shells out there, though,
most notably BusyBox-w32's ash. These native ports do not use any POSIX
emulation layer (or at most a *very* thin one, choosing to avoid
features such as fork() that are expensive to emulate on Windows), and
they use native Windows paths (usually with forward slashes instead of
backslashes, which is perfectly legal in almost all use cases).

And here comes the problem: with a $PWD looking like, say,
C:/git-sdk-64/usr/src/git/t/trash directory.t5813-proto-disable-ssh
Git's test scripts get quite a bit confused, as their assumptions have
been shattered. Not only does this path contain a colon (oh no!), it
also does not start with a slash.

This is a problem e.g. when constructing a URL as t5813 does it:
ssh://remote$PWD. Not only is it impossible to separate the "host" from
the path with a $PWD as above, even prefixing $PWD by a slash won't
work, as /C:/git-sdk-64/... is not a valid path.

As a workaround, detect when $PWD does not start with a slash on
Windows, and simply strip the drive prefix, using an obscure feature of
Windows paths: if an absolute Windows path starts with a slash, it is
implicitly prefixed by the drive prefix of the current directory. As we
are talking about the current directory here, anyway, that strategy
works.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
2ccb706461 t5605: special-case hardlink test for BusyBox-w32
When t5605 tries to verify that files are hardlinked (or that they are
not), it uses the `-links` option of the `find` utility.

BusyBox' implementation does not support that option, and BusyBox-w32's
lstat() does not even report the number of hard links correctly (for
performance reasons).

So let's just switch to a different method that actually works on
Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
5325f7cc97 t5532: workaround for BusyBox on Windows
While it may seem super convenient to some old Unix hands to simpy
require Perl to be available when running the test suite, this is a
major hassle on Windows, where we want to verify that Perl is not,
actually, required in a NO_PERL build.

As a super ugly workaround, we "install" a script into /usr/bin/perl
reading like this:

	#!/bin/sh

	# We'd much rather avoid requiring Perl altogether when testing
	# an installed Git. Oh well, that's why we cannot have nice
	# things.
	exec c:/git-sdk-64/usr/bin/perl.exe "$@"

The problem with that is that BusyBox assumes that the #! line in a
script refers to an executable, not to a script. So when it encounters
the line #!/usr/bin/perl in t5532's proxy-get-cmd, it barfs.

Let's help this situation by simply executing the Perl script with the
"interpreter" specified explicitly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
437a22dcaa t5003: use binary file from t/lib-diff/
At some stage, t5003-archive-zip wants to add a file that is not ASCII.
To that end, it uses /bin/sh. But that file may actually not exist (it
is too easy to forget that not all the world is Unix/Linux...)! Besides,
we already have perfectly fine binary files intended for use solely by
the tests. So let's use one of them instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
7035f23b48 test-lib: add BUSYBOX prerequisite
When running with BusyBox, we will want to avoid calling executables on
the PATH that are implemented in BusyBox itself.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:58 +00:00
Johannes Schindelin
f69d20994e tests (mingw): remove Bash-specific pwd option
The -W option is only understood by MSYS2 Bash's pwd command. We already
make sure to override `pwd` by `builtin pwd -W` for MINGW, so let's not
double the effort here.

This will also help when switching the shell to another one (such as
BusyBox' ash) whose pwd does *not* understand the -W option.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
ba7fc65f23 mingw: only use Bash-ism builtin pwd -W when available
Traditionally, Git for Windows' SDK uses Bash as its default shell.
However, other Unix shells are available, too. Most notably, the Win32
port of BusyBox comes with `ash` whose `pwd` command already prints
Windows paths as Git for Windows wants them, while there is not even a
`builtin` command.

Therefore, let's be careful not to override `pwd` unless we know that
the `builtin` command is available.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
4d20686f04 tests: use the correct path separator with BusyBox
BusyBox-w32 is a true Win32 application, i.e. it does not come with a
POSIX emulation layer.

That also means that it does *not* use the Unix convention of separating
the entries in the PATH variable using colons, but semicolons.

However, there are also BusyBox ports to Windows which use a POSIX
emulation layer such as Cygwin's or MSYS2's runtime, i.e. using colons
as PATH separators.

As a tell-tale, let's use the presence of semicolons in the PATH
variable: on Unix, it is highly unlikely that it contains semicolons,
and on Windows (without POSIX emulation), it is virtually guaranteed, as
everybody should have both $SYSTEMROOT and $SYSTEMROOT/system32 in their
PATH.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
c268099893 tests: only override sort & find if there are usable ones in /usr/bin/
The idea is to allow running the test suite on MinGit with BusyBox
installed in /mingw64/bin/sh.exe. In that case, we will want to exclude
sort & find (and other Unix utilities) from being bundled.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
b58f6fba32 tests: move test PNGs into t/lib-diff/
We already have a directory where we store files intended for use by
multiple test scripts. The same directory is a better home for the
test-binary-*.png files than t/.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
b63ce36e05 gitattributes: mark .png files as binary
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
67eab340ff tests(mingw): if iconv is unavailable, use test-helper --iconv
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
48d69ffb52 test-tool: learn to act as a drop-in replacement for iconv
It is convenient to assume that everybody who wants to build & test Git
has access to a working `iconv` executable (after all, we already pretty
much require libiconv).

However, that limits esoteric test scenarios such as Git for Windows',
where an end user installation has to ship with `iconv` for the sole
purpose of being testable. That payload serves no other purpose.

So let's just have a test helper (to be able to test Git, the test
helpers have to be available, after all) to act as `iconv` replacement.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
8bd4e557f6 mingw: when path_lookup() failed, try BusyBox
BusyBox comes with a ton of applets ("applet" being the identical
concept to Git's "builtins"). And similar to Git's builtins, the applets
can be called via `busybox <command>`, or the BusyBox executable can be
copied/hard-linked to the command name.

The similarities do not end here. Just as with Git's builtins, it is
problematic that BusyBox' hard-linked applets cannot easily be put into
a .zip file: .zip archives have no concept of hard-links and therefore
would store identical copies (and also extract identical copies,
"inflating" the archive unnecessarily).

To counteract that issue, MinGit already ships without hard-linked
copies of the builtins, and the plan is to do the same with BusyBox'
applets: simply ship busybox.exe as single executable, without
hard-linked applets.

To accommodate that, Git is being taught by this commit a very special
trick, exploiting the fact that it is possible to call an executable
with a command-line whose argv[0] is different from the executable's
name: when `sh` is to be spawned, and no `sh` is found in the PATH, but
busybox.exe is, use that executable (with unchanged argv).

Likewise, if any executable to be spawned is not on the PATH, but
busybox.exe is found, parse the output of `busybox.exe --help` to find
out what applets are included, and if the command matches an included
applet name, use busybox.exe to execute it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
16493a80be mingw: explicitly specify with which cmd to prefix the cmdline
The main idea of this patch is that even if we have to look up the
absolute path of the script, if only the basename was specified as
argv[0], then we should use that basename on the command line, too, not
the absolute path.

This patch will also help with the upcoming patch where we automatically
substitute "sh ..." by "busybox sh ..." if "sh" is not in the PATH but
"busybox" is: we will do that by substituting the actual executable, but
still keep prepending "sh" to the command line.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Ben Boeckel
85f826e13b clean: suggest using core.longPaths if paths are too long to remove
On Windows, git repositories may have extra files which need cleaned
(e.g., a build directory) that may be arbitrarily deep. Suggest using
`core.longPaths` if such situations are encountered.

Fixes: #2715
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
2026-06-15 20:51:57 +00:00
Jeff Hostetler
270d1e5cdc compat/fsmonitor/fsm-*-win32: support long paths
Update wchar_t buffers to use MAX_LONG_PATH instead of MAX_PATH and call
xutftowcs_long_path() in the Win32 backend source files.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
97df09a370 win32(long path support): leave drive-less absolute paths intact
When trying to ensure that long paths are handled correctly, we
first normalize absolute paths as we encounter them.

However, if the path is a so-called "drive-less" absolute path, i.e. if
it is relative to the current drive but _does_ start with a directory
separator, we would want the normalized path to be such a drive-less
absolute path, too.

Let's do that, being careful to still include the drive prefix when we
need to go through the `\\?\` dance (because there, the drive prefix is
absolutely required).

This fixes https://github.com/git-for-windows/git/issues/4586.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Karsten Blees
2ef89cc7b5 mingw: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users from shooting themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

[jes: adjusted test number to avoid conflicts, added support for
chdir(), etc]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Josh Soref <jsoref@gmail.com>
2026-06-15 20:51:57 +00:00
Doug Kelly
d0c5d0a813 pack-objects (mingw): demonstrate a segmentation fault with large deltas
There is a problem in the way 9ac3f0e5b3 (pack-objects: fix
performance issues on packing large deltas, 2018-07-22) initializes that
mutex in the `packing_data` struct. The problem manifests in a
segmentation fault on Windows, when a mutex (AKA critical section) is
accessed without being initialized. (With pthreads, you apparently do
not really have to initialize them?)

This was reported in https://github.com/git-for-windows/git/issues/1839.

Signed-off-by: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
ccdff18b8a Merge branch 'dont-clean-junctions-fscache'
We already avoid traversing NTFS junction points in `git clean -dfx`.
With this topic branch, we do that when the FSCache is enabled, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
72fb19023e Merge remote-tracking branch 'benpeart/fscache-per-thread-gfw'
This brings substantial wins in performance because the FSCache is now
per-thread, being merged to the primary thread only at the end, so we do
not have to lock (except while merging).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
42b9abd9dd clean: make use of FSCache
The `git clean` command needs to enumerate plenty of files and
directories, and can therefore benefit from the FSCache.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
592448e570 Merge pull request #1909 from benpeart/free-fscache-after-status-gfw
status: disable and free fscache at the end of the status command
2026-06-15 20:51:57 +00:00
Johannes Schindelin
1a83e3ee07 fscache: implement an FSCache-aware is_mount_point()
When FSCache is active, we can cache the reparse tag and use it directly
to determine whether a path refers to an NTFS junction, without any
additional, costly I/O.

Note: this change only makes a difference with the next commit, which
will make use of the FSCache in `git clean` (contingent on
`core.fscache` set, of course).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:57 +00:00
Johannes Schindelin
fe8e0b3201 Merge branch 'fscache' 2026-06-15 20:51:57 +00:00
Ben Peart
54dbe7cf91 fscache: teach fscache to use NtQueryDirectoryFile
Using FindFirstFileExW() requires the OS to allocate a 64K buffer for each
directory and then free it when we call FindClose().  Update fscache to call
the underlying kernel API NtQueryDirectoryFile so that we can do the buffer
management ourselves.  That allows us to allocate a single buffer for the
lifetime of the cache and reuse it for each directory.

This change improves performance of 'git status' by 18% in a repo with ~200K
files and 30k folders.

Documentation for NtQueryDirectoryFile can be found at:

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntifs/nf-ntifs-ntquerydirectoryfile
https://docs.microsoft.com/en-us/windows/desktop/FileIO/file-attribute-constants
https://docs.microsoft.com/en-us/windows/desktop/fileio/reparse-point-tags

To determine if the specified directory is a symbolic link, inspect the
FileAttributes member to see if the FILE_ATTRIBUTE_REPARSE_POINT flag is
set. If so, EaSize will contain the reparse tag (this is a so far
undocumented feature, but confirmed by the NTFS developers). To
determine if the reparse point is a symbolic link (and not some other
form of reparse point), test whether the tag value equals the value
IO_REPARSE_TAG_SYMLINK.

The NtQueryDirectoryFile() call works best (and on Windows 8.1 and
earlier, it works *only*) with buffer sizes up to 64kB. Which is 32k
wide characters, so let's use that as our buffer size.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:56 +00:00
Ben Peart
24b0821289 status: disable and free fscache at the end of the status command
At the end of the status command, disable and free the fscache so that we
don't leak the memory and so that we can dump the fscache statistics.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
xungeng li
c2f4b625c5 fscache: optionally enable wsl compability file mode bits
The Windows Subsystem for Linux (WSL) version 2 allows to use `chmod` on
NTFS volumes provided that they are mounted with metadata enabled (see
https://devblogs.microsoft.com/commandline/chmod-chown-wsl-improvements/
for details), for example:

	$ chmod 0755 /mnt/d/test/a.sh

In order to facilitate better collaboration between the Windows
version of Git and the WSL version of Git, we can make the Windows
version of Git also support reading and writing NTFS file modes
in a manner compatible with WSL.

Since this slightly slows down operations where lots of files are
created (such as an initial checkout), this feature is only enabled when
`core.WSLCompat` is set to true. Note that you also have to set
`core.fileMode=true` in repositories that have been initialized without
enabling WSL compatibility.

There are several ways to enable metadata loading for NTFS volumes
in WSL, one of which is to modify `/etc/wsl.conf` by adding:

```
[automount]
enabled = true
options = "metadata,umask=027,fmask=117"
```

And reboot WSL.

It can also be enabled temporarily by this incantation:

	$ sudo umount /mnt/c &&
	  sudo mount -t drvfs C: /mnt/c -o metadata,uid=1000,gid=1000,umask=22,fmask=111

It's important to note that this modification is compatible with, but
does not depend on WSL. The helper functions in this commit can operate
independently and functions normally on devices where WSL is not
installed or properly configured.

Signed-off-by: xungeng li <xungeng@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:56 +00:00
Derrick Stolee
f2004786f1 unpack-trees: enable fscache for sparse-checkout
When updating the skip-worktree bits in the index to align with new
values in a sparse-checkout file, Git scans the entire working
directory with lstat() calls. In a sparse-checkout, many of these
lstat() calls are for paths that do not exist.

Enable the fscache feature during this scan. Since enable_fscache()
calls nest, the disable_fscache() method decrements a counter and
would only clear the cache if that counter reaches zero.

In a local test of a repo with ~2.2 million paths, updating the index
with git read-tree -m -u HEAD with a sparse-checkout file containing
only /.gitattributes improved from 2-3 minutes to ~6 seconds.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:56 +00:00
Ben Peart
46b07824ad fscache: make fscache_enable() thread safe
The recent change to make fscache thread specific relied on fscache_enable()
being called first from the primary thread before being called in parallel
from worker threads.  Make that more robust and protect it with a critical
section to avoid any issues.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
Johannes Schindelin
e843bbbea6 fscache: Windows Docker volumes are *not* symbolic links
... even if they may look like them.

As looking up the target of the "symbolic link" (just to see whether it
starts with `/ContainerMappedDirectories/`) is pretty expensive, we
do it when we can be *really* sure that there is a possibility that this
might be the case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: JiSeop Moon <zcube@zcube.kr>
2026-06-15 20:51:56 +00:00
Ben Peart
dd8749af38 fscache: add fscache hit statistics
Track fscache hits and misses for lstat and opendir requests.  Reporting of
statistics is done when the cache is disabled for the last time and freed
and is only reported if GIT_TRACE_FSCACHE is set.

Sample output is:

11:33:11.836428 compat/win32/fscache.c:433 fscache: lstat 3775, opendir 263, total requests/misses 4052/269

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
Ben Peart
8cb77a67a6 fscache: teach fscache to use mempool
Now that the fscache is single threaded, take advantage of the mem_pool as
the allocator to significantly reduce the cost of allocations and frees.

With the reduced cost of free, in future patches, we can start freeing the
fscache at the end of commands instead of just leaking it.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:56 +00:00
Johannes Schindelin
3e3a2fd7c6 fscache: remember the reparse tag for each entry
We will use this in the next commit to implement an FSCache-aware
version of is_mount_point().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:56 +00:00
Ben Peart
53251f6137 fscache: add GIT_TEST_FSCACHE support
Add support to fscache to enable running the entire test suite with the
fscache enabled.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
Ben Peart
c698f13f1b fscache: update fscache to be thread specific instead of global
The threading model for fscache has been to have a single, global cache.
This puts requirements on it to be thread safe so that callers like
preload-index can call it from multiple threads.  This was implemented
with a single mutex and completion events which introduces contention
between the calling threads.

Simplify the threading model by making fscache thread specific.  This allows
us to remove the global mutex and synchronization events entirely and instead
associate a fscache with every thread that requests one. This works well with
the current multi-threading which divides the cache entries into blocks with
a separate thread processing each block.

At the end of each worker thread, if there is a fscache on the primary
thread, merge the cached results from the worker into the primary thread
cache. This enables us to reuse the cache later especially when scanning for
untracked files.

In testing, this reduced the time spent in preload_index() by about 25% and
also reduced the CPU utilization significantly.  On a repo with ~200K files,
it reduced overall status times by ~12%.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
Ben Peart
24795a1014 fscache: use FindFirstFileExW to avoid retrieving the short name
Use FindFirstFileExW with FindExInfoBasic to avoid forcing NTFS to look up
the short name.  Also switch to a larger (64K vs 4K) buffer using
FIND_FIRST_EX_LARGE_FETCH to minimize round trips to the kernel.

In a repo with ~200K files, this drops warm cache status times from 3.19
seconds to 2.67 seconds for a 16% savings.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
Ben Peart
7aca9a3b13 fscache: fscache takes an initial size
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.

Add a separate disable_fscache() macro to make the code clearer and easier
to read.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:56 +00:00
Ben Peart
3be9eef13c Enable the filesystem cache (fscache) in refresh_index().
On file systems that support it, this can dramatically speed up operations
like add, commit, describe, rebase, reset, rm that would otherwise have to
lstat() every file to "re-match" the stat information in the index to that
of the file system.

On a synthetic repo with 1M files, "git reset" dropped from 52.02 seconds to
14.42 seconds for a savings of 72%.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
Ben Peart
c8cf5d9bb5 mem_pool: add GIT_TRACE_MEMPOOL support
Add tracing around initializing and discarding mempools. In discard report
on the amount of memory unused in the current block to help tune setting
the initial_size.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2026-06-15 20:51:56 +00:00
Takuto Ikuta
65924f5cd3 checkout.c: enable fscache for checkout again
This is retry of #1419.

I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.

git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.

Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333

Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136

I confirmed this patch passed all tests in t/ with core_fscache=1.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2026-06-15 20:51:56 +00:00
Takuto Ikuta
1ae62d48cd fetch-pack.c: enable fscache for stats under .git/objects
When I do git fetch, git call file stats under .git/objects for each
refs. This takes time when there are many refs.

By enabling fscache, git takes file stats by directory traversing and that
improved the speed of fetch-pack for repository having large number of
refs.

In my windows workstation, this improves the time of `git fetch` for
chromium repository like below. I took stats 3 times.

* With this patch
TotalSeconds: 9.9825165
TotalSeconds: 9.1862075
TotalSeconds: 10.1956256
Avg: 9.78811653333333

* Without this patch
TotalSeconds: 15.8406702
TotalSeconds: 15.6248053
TotalSeconds: 15.2085938
Avg: 15.5580231

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-06-15 20:51:56 +00:00
Jeff Hostetler
7173df1206 dir.c: regression fix for add_excludes with fscache
Fix regression described in:
https://github.com/git-for-windows/git/issues/1392

which was introduced in:
b2353379bb

Problem Symptoms
================
When the user has a .gitignore file that is a symlink, the fscache
optimization introduced above caused the stat-data from the symlink,
rather that of the target file, to be returned.  Later when the ignore
file was read, the buffer length did not match the stat.st_size field
and we called die("cannot use <path> as an exclude file")

Optimization Rationale
======================
The above optimization calls lstat() before open() primarily to ask
fscache if the file exists.  It gets the current stat-data as a side
effect essentially for free (since we already have it in memory).
If the file does not exist, it does not need to call open().  And
since very few directories have .gitignore files, we can greatly
reduce time spent in the filesystem.

Discussion of Fix
=================
The above optimization calls lstat() rather than stat() because the
fscache only intercepts lstat() calls.  Calls to stat() stay directed
to the mingw_stat() completly bypassing fscache.  Furthermore, calls
to mingw_stat() always call {open, fstat, close} so that symlinks are
properly dereferenced, which adds *additional* open/close calls on top
of what the original code in dir.c is doing.

Since the problem only manifests for symlinks, we add code to overwrite
the stat-data when the path is a symlink.  This preserves the effect of
the performance gains provided by the fscache in the normal case.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2026-06-15 20:51:56 +00:00
Jeff Hostetler
d41c0c1834 fscache: make fscache_enabled() public
Make fscache_enabled() function public rather than static.
Remove unneeded fscache_is_enabled() function.
Change is_fscache_enabled() macro to call fscache_enabled().

is_fscache_enabled() now takes a pathname so that the answer
is more precise and mean "is fscache enabled for this pathname",
since fscache only stores repo-relative paths and not absolute
paths, we can avoid attempting lookups for absolute paths.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2026-06-15 20:51:56 +00:00