This topic vendors in mimalloc v2.0.9, a fast allocator that allows Git
for Windows to perform efficiently.
Switch Git for Windows to using mimalloc instead of nedmalloc
In MSYS2, we have two Python interpreters at our disposal, so we can
include the Python stuff in the build.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch teaches `git clean` to respect NTFS junctions and Unix
bind mounts: it will now stop at those boundaries.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch allows us to specify absolute paths without the drive
prefix e.g. when cloning.
Example:
C:\Users\me> git clone https://github.com/git/git \upstream-git
This will clone into a new directory C:\upstream-git, in line with how
Windows interprets absolute paths.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These fixes were necessary for Sverre Rabbelier's remote-hg to work,
but for some magic reason they are not necessary for the current
remote-hg. Makes you wonder how that one gets away with it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Backport a couple fixes to make the CI build run again (so much for
reproducible builds...).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 904339edbd80 (Introduce support for the Meson build system,
2024-12-06) the `meson.build` file was introduced, adding also a
Windows-specific list of source files. This list was obviously meant to
be sorted alphabetically, but there is one mistake. Let's fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 590e081dea7c (ident: add NO_GECOS_IN_PWENT for systems without
pw_gecos in struct passwd, 2011-05-19), code was introduced to iterate
over the `gw_gecos` field; The loop variable is of type `char *`, which
assumes that `gw_gecos` is writable.
However, it is not necessarily writable (and it is a bad idea to have it
writable in the first place), so let's switch the loop variable type to
`const char *`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Thorough benchmarking with repacking a subset of linux.git (the commit
history reachable from 93a6fefe2f ([PATCH] fix the SYSCTL=n compilation,
2007-02-28), to be precise) suggest that this allocator is on par, in
multi-threaded situations maybe even better than nedmalloc:
`git repack -adfq` with mimalloc, 8 threads:
31.166991900 27.576763800 28.712311000 27.373859000 27.163141900
`git repack -adfq` with nedmalloc, 8 threads:
31.915032900 27.149883100 28.244933700 27.240188800 28.580849500
In a different test using GitHub Actions build agents (probably
single-threaded, a core-strength of nedmalloc)):
`git repack -q -d -l -A --unpack-unreachable=2.weeks.ago` with mimalloc:
943.426 978.500 939.709 959.811 954.605
`git repack -q -d -l -A --unpack-unreachable=2.weeks.ago` with nedmalloc:
995.383 952.179 943.253 963.043 980.468
While these measurements were not executed with complete scientific
rigor, as no hardware was set aside specifically for these benchmarks,
it shows that mimalloc and nedmalloc perform almost the same, nedmalloc
with a bit higher variance and also slightly higher average (further
testing suggests that nedmalloc performs worse in multi-threaded
situations than in single-threaded ones).
In short: mimalloc seems to be slightly better suited for our purposes
than nedmalloc.
Seeing that mimalloc is developed actively, while nedmalloc ceased to
see any updates in eight years, let's use mimalloc on Windows instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Always use the internal "use_weak" random seed when initializing
the "mimalloc" heap when statically linked on Windows.
The imported "mimalloc" routines support several random sources
to seed the heap data structures, including BCrypt.dll and
RtlGenRandom. Crashes have been reported when using BCrypt.dll
if it initialized during an `atexit()` handler function. Granted,
such DLL initialization should not happen in an atexit handler,
but yet the crashes remain.
It should be noted that on Windows when statically linked, the
mimalloc startup code (called by the GCC CRT to initialize static
data prior to calling `main()`) always uses the internal "weak"
random seed. "mimalloc" does not try to load an alternate
random source until after the OS initialization has completed.
Heap data is stored in `__declspec(thread)` TLS data and in theory
each Git thread will have its own heap data. However, testing
shows that the "mimalloc" library doesn't actually call
`os_random_buf()` (to load a new random source) when creating these
new per-thread heap structures.
However, if an atexit handler is forced to run on a non-main
thread, the "mimalloc" library *WILL* try to create a new heap
and seed it with `os_random_buf()`. (The reason for this is still
a mystery to this author.) The `os_random_buf()` call can cause
the (previously uninitialized BCrypt.dll library) to be dynamically
loaded and a call made into it. Crashes have been reported in
v2.40.1.vfs.0.0 while in this call.
As a workaround, the fix here forces the use of the internal
"use_weak" random code for the subsequent `os_random_buf()` calls.
Since we have been using that random generator for the majority
of the program, it seems safe to use it for the final few mallocs
in the atexit handler (of which there really shouldn't be that many.
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
By defining `USE_MIMALLOC`, Git can now be compiled with that
nicely-fast and small allocator.
Note that we have to disable a couple `DEVELOPER` options to build
mimalloc's source code, as it makes heavy use of declarations after
statements, among other things that disagree with Git's conventions.
We even have to silence some GCC warnings in non-DEVELOPER mode. For
example, the `-Wno-array-bounds` flag is needed because in `-O2` builds,
trying to call `NtCurrentTeb()` (which `_mi_thread_id()` does on
Windows) causes the bogus warning about a system header, likely related
to https://sourceforge.net/p/mingw-w64/mailman/message/37674519/ and to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578:
C:/git-sdk-64-minimal/mingw64/include/psdk_inc/intrin-impl.h:838:1:
error: array subscript 0 is outside array bounds of 'long long unsigned int[0]' [-Werror=array-bounds]
838 | __buildreadseg(__readgsqword, unsigned __int64, "gs", "q")
| ^~~~~~~~~~~~~~
Also: The `mimalloc` library uses C11-style atomics, therefore we must
require that standard when compiling with GCC if we want to use
`mimalloc` (instead of requiring "only" C99). This is what we do in the
CMake definition already, therefore this commit does not need to touch
`contrib/buildsystems/`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We want to compile mimalloc's source code as part of Git, rather than
requiring the code to be built as an external library: mimalloc uses a
CMake-based build, which is not necessarily easy to integrate into the
flavors of Git for Windows (which will be the main benefitting port).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While Git for Windows does not _ship_ Python (in order to save on
bandwidth), MSYS2 provides very fine Python interpreters that users can
easily take advantage of, by using Git for Windows within its SDK.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When specifying an absolute path without a drive prefix, we convert that
path internally. Let's make sure that we handle that case properly, too
;-)
This fixes the command
git clone https://github.com/git-for-windows/git \G4W
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
After importing anything with fast-import, we should always let the
garbage collector do its job, since the objects are written to disk
inefficiently.
This brings down an initial import of http://selenic.com/hg from about
230 megabytes to about 14.
In the future, we may want to make this configurable on a per-remote
basis, or maybe teach fast-import about it in the first place.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit imports mimalloc's source code as per v2.1.2, fetched from
the tag at https://github.com/microsoft/mimalloc.
The .c files are from the src/ subdirectory, and the .h files from the
include/ and include/mimalloc/ subdirectories. We will subsequently
modify the source code to accommodate building within Git's context.
Since we plan on using the `mi_*()` family of functions, we skip the
C++-specific source code, some POSIX compliant functions to interact
with mimalloc, and the code that wants to support auto-magic overriding
of the `malloc()` function (mimalloc-new-delete.h, alloc-posix.c,
mimalloc-override.h, alloc-override.c, alloc-override-osx.c,
alloc-override-win.c and static.c).
To appease the `check-whitespace` job of Git's Continuous Integration,
this commit was washed one time via `git rebase --whitespace=fix`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.
In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".
Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).
This addresses https://github.com/git-for-windows/git/issues/607
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, there are several categories of absolute paths. One such
category starts with a backslash and is implicitly relative to the
drive associated with the current working directory. Example:
c:
git clone https://github.com/git-for-windows/git \G4W
should clone into C:\G4W.
There is currently a problem with that, in that mingw_mktemp() does not
expect the _wmktemp() function to prefix the absolute path with the
drive prefix, and as a consequence, the resulting path does not fit into
the originally-passed string buffer. The symptom is a "Result too large"
error.
Reported by Juan Carlos Arevalo Baeza.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We are about to vendor in `mimalloc`'s source code which we will want to
include `git-compat-util.h` after defining that constant.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[PT: ensure we add an additional element to the argv array]
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The mingw-w64 GCC seems to link implicitly to libwinpthread, which does
implement a pthread emulation (that is more complete than Git's). Let's
keep preferring Git's.
To avoid linker errors where it thinks that the `pthread_self` and the
`pthread_create` symbols are defined twice, let's give our version a
`win32_` prefix, just like we already do for `pthread_join()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This happens only when the corresponding commits are not exported in
the current fast-export run. This can happen either when the relevant
commit is already marked, or when the commit is explicitly marked
as UNINTERESTING with a negative ref by another argument.
This breaks fast-export basec remote helpers.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Now that we dropped `contrib/buildsystems/generate` to generate Visual
Studio Solution files, it is time to also drop the `vcxproj` Makefile
target that depended on that script.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These fixes have been sent to the Git mailing list but have not been
picked up by the Git project yet.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Before we had CMake support, the only way to build Git in Visual Studio
was via this hacky `generate` script.
For a while I tried to fix whenever things got broken, in particular to
allow building confidence in embargoed releases by running the CI builds
in Azure Pipelines in a private Azure DevOps project. I even carried the
patches in Git for Windows with the intention of upstreaming them,
eventually.
However, it is a lot of work with too little benefit. CMake is much
better supported by Visual Studio. So let's drop this hacky script (plus
support code).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It is not useful because we do not have any persisted directory anymore,
not since dropping our Travis CI support.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In some implementations, `regexec_buf()` assumes that it is fed lines;
Without `REG_NOTEOL` it thinks the end of the buffer is the end of a
line. Which makes sense, but trips up this case because we are not
feeding lines, but rather a whole buffer. So the final newline is not
the start of an empty line, but the true end of the buffer.
This causes an interesting bug:
$ echo content >file.txt
$ git grep --no-index -n '^$' file.txt
file.txt:2:
This bug is fixed by making the end of the buffer consistently the end
of the final line.
The patch was applied from
https://lore.kernel.org/git/20250113062601.GD767856@coredump.intra.peff.net/
Reported-by: Olly Betts <olly@survex.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This addresses:
- CVE-2024-52005:
Insufficient neutralization of ANSI escape sequences in sideband
payload can be used to mislead Git users into believing that
certain remote-generated messages actually originate from Git.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
When a Unix socket is initialized, the current directory's path is
stored so that the cleanup code can `chdir()` back to where it was
before exit.
If the path that needs to be stored exceeds the default size of the
`sun_path` attribute of `struct sockaddr_un` (which is defined as a
108-sized byte array on Linux), a larger buffer needs to be allocated so
that it can hold the path, and it is the responsibility of the
`unix_sockaddr_cleanup()` function to release that allocated memory.
In Git's CI, this stack allocation is not necessary because the code is
checked out to `/home/runner/work/git/git`. Concatenate the path
`t/trash directory.t0301-credential-cache/.cache/git/credential/socket`
and a terminating NUL, and you end up with 96 bytes, 12 shy of the
default `sun_path` size.
However, I use worktrees with slightly longer paths:
`/home/me/projects/git/yes/i/nest/worktrees/to/organize/them/` is more
in line with what I have. When I recently tried to locally reproduce a
failure of the `linux-leaks` CI job, this t0301 test failed (where it
had not failed in CI).
The reason: When `credential-cache` tries to reach its daemon initially
by calling `unix_sockaddr_init()`, it is expected that the daemon cannot
be reached (the idea is to spin up the daemon in that case and try
again). However, when this first call to `unix_sockaddr_init()` fails,
the code returns early from the `unix_stream_connect()` function
_without_ giving the cleanup code a chance to run, skipping the
deallocation of above-mentioned path.
The fix is easy: do not return early but instead go directly to the
cleanup code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding two commits introduced special handling of the sideband
channel to neutralize ANSI escape sequences before sending the payload
to the terminal, and `sideband.allowControlCharacters` to override that
behavior.
However, some `pre-receive` hooks that are actively used in practice
want to color their messages and therefore rely on the fact that Git
passes them through to the terminal.
In contrast to other ANSI escape sequences, it is highly unlikely that
coloring sequences can be essential tools in attack vectors that mislead
Git users e.g. by hiding crucial information.
Therefore we can have both: Continue to allow ANSI coloring sequences to
be passed to the terminal, and neutralize all other ANSI escape
sequences.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding commit fixed the vulnerability whereas sideband messages
(that are under the control of the remote server) could contain ANSI
escape sequences that would be sent to the terminal verbatim.
However, this fix may not be desirable under all circumstances, e.g.
when remote servers deliberately add coloring to their messages to
increase their urgency.
To help with those use cases, give users a way to opt-out of the
protections: `sideband.allowControlCharacters`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The output of `git clone` is a vital component for understanding what
has happened when things go wrong. However, these logs are partially
under the control of the remote server (via the "sideband", which
typically contains what the remote `git pack-objects` process sends to
`stderr`), and is currently not sanitized by Git.
This makes Git susceptible to ANSI escape sequence injection (see
CWE-150, https://cwe.mitre.org/data/definitions/150.html), which allows
attackers to corrupt terminal state, to hide information, and even to
insert characters into the input buffer (i.e. as if the user had typed
those characters).
To plug this vulnerability, disallow any control character in the
sideband, replacing them instead with the common `^<letter/symbol>`
(e.g. `^[` for `\x1b`, `^A` for `\x01`).
There is likely a need for more fine-grained controls instead of using a
"heavy hammer" like this, which will be introduced subsequently.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Most notably, this fixes
https://github.com/git-for-windows/git/issues/5431, i.e. that the ARM64
flavor of Git for Windows does not use `/etc/gitconfig` by mistake, but
`/clangarm64/etc/gitconfig` instead.
My original intention was to fix only that issue, but when I fired up a
Git for Windows/ARM64 SDK and tried to build, I immediately ran into
trouble, then noticed places where ARM64 was not yet handled, and one
thing led to another and now it's a 6-patch PR instead of a single-patch
one.
Upgrade the minimum Perl version enforced by meson-based build to
match what Makefile-based build uses.
* po/meson-perl-fix:
meson: fix Perl version check for Meson versions before 1.7.0
meson: bump minimum required Perl version to 5.26.0
Correct the default target in Documentation/Makefile, and
future-proof all Makefiles from similar breakages by declaring the
default target (which happens to be "all") upfront.
* ad/set-default-target-in-makefiles:
Makefile: set default goals in makefiles
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).
* pw/merge-tree-stdin-deadlock-fix:
merge-tree: fix link formatting in html docs
merge-tree: improve docs for --stdin
merge-tree: only use basic merge config
merge-tree: remove redundant code
merge-tree --stdin: flush stdout to avoid deadlock
The documentation of "git commit" and "git rebase" now refer to
commit titles as such, not "subject".
* mh/doc-commit-title-not-subject:
doc: use 'title' consistently
The -G/-S options to the "diff" family of commands caused us to hit
a BUG() when they get no values; they have been corrected.
* bc/diff-reject-empty-arg-to-pickaxe:
diff: don't crash with empty argument to -G or -S
Noises from "-Wsign-compare" in the borrowed xdiff code has been
squelched.
* da/xdiff-w-sign-compare-workaround:
xdiff: avoid signed vs. unsigned comparisons in xutils.c
xdiff: avoid signed vs. unsigned comparisons in xpatience.c
xdiff: avoid signed vs. unsigned comparisons in xhistogram.c
xdiff: avoid signed vs. unsigned comparisons in xemit.c
xdiff: avoid signed vs. unsigned comparisons in xdiffi.c
xdiff: move sign comparison warning guard into each file
This needs to be guarded against GCC complaining that it does not know
this option... _sigh_
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>