Commit Graph

103934 Commits

Author SHA1 Message Date
Johannes Schindelin
be4dbdfdb0 mingw: when path_lookup() failed, try BusyBox
BusyBox comes with a ton of applets ("applet" being the identical
concept to Git's "builtins"). And similar to Git's builtins, the applets
can be called via `busybox <command>`, or the BusyBox executable can be
copied/hard-linked to the command name.

The similarities do not end here. Just as with Git's builtins, it is
problematic that BusyBox' hard-linked applets cannot easily be put into
a .zip file: .zip archives have no concept of hard-links and therefore
would store identical copies (and also extract identical copies,
"inflating" the archive unnecessarily).

To counteract that issue, MinGit already ships without hard-linked
copies of the builtins, and the plan is to do the same with BusyBox'
applets: simply ship busybox.exe as single executable, without
hard-linked applets.

To accommodate that, Git is being taught by this commit a very special
trick, exploiting the fact that it is possible to call an executable
with a command-line whose argv[0] is different from the executable's
name: when `sh` is to be spawned, and no `sh` is found in the PATH, but
busybox.exe is, use that executable (with unchanged argv).

Likewise, if any executable to be spawned is not on the PATH, but
busybox.exe is found, parse the output of `busybox.exe --help` to find
out what applets are included, and if the command matches an included
applet name, use busybox.exe to execute it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:58 +01:00
Johannes Schindelin
ae409fe4dc mingw: explicitly specify with which cmd to prefix the cmdline
The main idea of this patch is that even if we have to look up the
absolute path of the script, if only the basename was specified as
argv[0], then we should use that basename on the command line, too, not
the absolute path.

This patch will also help with the upcoming patch where we automatically
substitute "sh ..." by "busybox sh ..." if "sh" is not in the PATH but
"busybox" is: we will do that by substituting the actual executable, but
still keep prepending "sh" to the command line.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:58 +01:00
Johannes Schindelin
8b75c314cf transport-helper: prefer Git's builtins over dashed form
This helps with minimal installations such as MinGit that refuse to
waste .zip real estate by shipping identical copies of builtins (.zip
files do not support hard links).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:58 +01:00
Bert Belder
d3bc1daf30 Win32: symlink: add test for symlink attribute
Signed-off-by: Bert Belder <bertbelder@gmail.com>
2019-10-30 11:55:58 +01:00
Bert Belder
ce1981d636 mingw: allow to specify the symlink type in .gitattributes
On Windows, symbolic links have a type: a "file symlink" must point at
a file, and a "directory symlink" must point at a directory. If the
type of symlink does not match its target, it doesn't work.

Git does not record the type of symlink in the index or in a tree. On
checkout it'll guess the type, which only works if the target exists
at the time the symlink is created. This may often not be the case,
for example when the link points at a directory inside a submodule.

By specifying `symlink=file` or `symlink=dir` the user can specify what
type of symlink Git should create, so Git doesn't have to rely on
unreliable heuristics.

Signed-off-by: Bert Belder <bertbelder@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:58 +01:00
Johannes Schindelin
fe74891711 Introduce helper to create symlinks that knows about index_state
On Windows, symbolic links actually have a type depending on the target:
it can be a file or a directory.

In certain circumstances, this poses problems, e.g. when a symbolic link
is supposed to point into a submodule that is not checked out, so there
is no way for Git to auto-detect the type.

To help with that, we will add support over the course of the next
commits to specify that symlink type via the Git attributes. This
requires an index_state, though, something that Git for Windows'
`symlink()` replacement cannot know about because the function signature
is defined by the POSIX standard and not ours to change.

So let's introduce a helper function to create symbolic links that
*does* know about the index_state.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:58 +01:00
Bert Belder
fcddafd917 Win32: symlink: move phantom symlink creation to a separate function
Signed-off-by: Bert Belder <bertbelder@gmail.com>
2019-10-30 11:55:58 +01:00
Johannes Schindelin
54f6c287ac mingw: try to create symlinks without elevated permissions
With Windows 10 Build 14972 in Developer Mode, a new flag is supported
by CreateSymbolicLink() to create symbolic links even when running
outside of an elevated session (which was previously required).

This new flag is called SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE and
has the numeric value 0x02.

Previous Windows 10 versions will not understand that flag and return an
ERROR_INVALID_PARAMETER, therefore we have to be careful to try passing
that flag only when the build number indicates that it is supported.

For more information about the new flag, see this blog post:
https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/

This patch is loosely based on the patch submitted by Samuel D. Leslie
as https://github.com/git-for-windows/git/pull/1184.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
513bf2da7b Win32: symlink: add support for symlinks to directories
Symlinks on Windows have a flag that indicates whether the target is a file
or a directory. Symlinks of wrong type simply don't work. This even affects
core Win32 APIs (e.g. DeleteFile() refuses to delete directory symlinks).

However, CreateFile() with FILE_FLAG_BACKUP_SEMANTICS doesn't seem to care.
Check the target type by first creating a tentative file symlink, opening
it, and checking the type of the resulting handle. If it is a directory,
recreate the symlink with the directory flag set.

It is possible to create symlinks before the target exists (or in case of
symlinks to symlinks: before the target type is known). If this happens,
create a tentative file symlink and postpone the directory decision: keep
a list of phantom symlinks to be processed whenever a new directory is
created in mingw_mkdir().

Limitations: This algorithm may fail if a link target changes from file to
directory or vice versa, or if the target directory is created in another
process.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
7e25611a14 Win32: implement basic symlink() functionality (file symlinks only)
Implement symlink() that always creates file symlinks. Fails with ENOSYS
if symlinks are disabled or unsupported.

Note: CreateSymbolicLinkW() was introduced with symlink support in Windows
Vista. For compatibility with Windows XP, we need to load it dynamically
and fail gracefully if it isnt's available.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
e39b3bfbe9 Win32: implement readlink()
Implement readlink() by reading NTFS reparse points. Works for symlinks
and directory junctions. If symlinks are disabled, fail with ENOSYS.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
c6f1dd9384 Win32: mingw_chdir: change to symlink-resolved directory
If symlinks are enabled, resolve all symlinks when changing directories,
as required by POSIX.

Note: Git's real_path() function bases its link resolution algorithm on
this property of chdir(). Unfortunately, the current directory on Windows
is limited to only MAX_PATH (260) characters. Therefore using symlinks and
long paths in combination may be problematic.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
a2e3932bb3 Win32: mingw_rename: support renaming symlinks
MSVCRT's _wrename() cannot rename symlinks over existing files: it returns
success without doing anything. Newer MSVCR*.dll versions probably do not
have this problem: according to CRT sources, they just call MoveFileEx()
with the MOVEFILE_COPY_ALLOWED flag.

Get rid of _wrename() and call MoveFileEx() with proper error handling.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
22b7a211a3 Win32: mingw_unlink: support symlinks to directories
_wunlink() / DeleteFileW() refuses to delete symlinks to directories. If
_wunlink() fails with ERROR_ACCESS_DENIED, try _wrmdir() as well.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
29980b1ad5 Win32: add symlink-specific error codes
Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
16377e6b4a Win32: change default of 'core.symlinks' to false
Symlinks on Windows don't work the same way as on Unix systems. E.g. there
are different types of symlinks for directories and files, creating
symlinks requires administrative privileges etc.

By default, disable symlink support on Windows. I.e. users explicitly have
to enable it with 'git config [--system|--global] core.symlinks true'.

The test suite ignores system / global config files. Allow testing *with*
symlink support by checking if native symlinks are enabled in MSys2 (via
'MSYS=winsymlinks:nativestrict').

Reminder: This would need to be changed if / when we find a way to run the
test suite in a non-MSys-based shell (e.g. dash).

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
489a43bfc6 Win32: factor out retry logic
The retry pattern is duplicated in three places. It also seems to be too
hard to use: mingw_unlink() and mingw_rmdir() duplicate the code to retry,
and both of them do so incompletely. They also do not restore errno if the
user answers 'no'.

Introduce a retry_ask_yes_no() helper function that handles retry with
small delay, asking the user, and restoring errno.

mingw_unlink: include _wchmod in the retry loop (which may fail if the
file is locked exclusively).

mingw_rmdir: include special error handling in the retry loop.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
87fde98f14 Win32: lstat(): return adequate stat.st_size for symlinks
Git typically doesn't trust the stat.st_size member of symlinks (e.g. see
strbuf_readlink()). However, some functions take shortcuts if st_size is 0
(e.g. diff_populate_filespec()).

In mingw_lstat() and fscache_lstat(), make sure to return an adequate size.

The extra overhead of opening and reading the reparse point to calculate
the exact size is not necessary, as git doesn't rely on the value anyway.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
dffb105b46 Win32: teach fscache and dirent about symlinks
Move S_IFLNK detection to file_attr_to_st_mode() and reuse it in fscache.

Implement DT_LNK detection in dirent.c and the fscache readdir version.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:57 +01:00
Karsten Blees
4a085078bf Win32: let mingw_lstat() error early upon problems with reparse points
When obtaining lstat information for reparse points, we need to call
FindFirstFile() in addition to GetFileInformationEx() to obtain the type
of the reparse point (symlink, mount point etc.). However, currently there
is no error handling whatsoever if FindFirstFile() fails.

Call FindFirstFile() before modifying the stat *buf output parameter and
error out if the call fails.

Note: The FindFirstFile() return value includes all the data that we get
from GetFileAttributesEx(), so we could replace GetFileAttributesEx() with
FindFirstFile(). We don't do that because GetFileAttributesEx() is about
twice as fast for single files. I.e. we only pay the extra cost of calling
FindFirstFile() in the rare case that we encounter a reparse point.

Note: The indentation of the remaining reparse point code will be fixed in
the next patch.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:56 +01:00
Karsten Blees
8a1df28745 Win32: remove separate do_lstat() function
With the new mingw_stat() implementation, do_lstat() is only called from
mingw_lstat() (with follow == 0). Remove the extra function and the old
mingw_stat()-specific (follow == 1) logic.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:56 +01:00
Karsten Blees
8abb782ec2 Win32: implement stat() with symlink support
With respect to symlinks, the current stat() implementation is almost the
same as lstat(): except for the file type (st_mode & S_IFMT), it returns
information about the link rather than the target.

Implement stat by opening the file with as little permissions as possible
and calling GetFileInformationByHandle on it. This way, all link resoltion
is handled by the Windows file system layer.

If symlinks are disabled, use lstat() as before, but fail with ELOOP if a
symlink would have to be resolved.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:56 +01:00
Karsten Blees
1f2d3a99b4 Win32: don't call GetFileAttributes twice in mingw_lstat()
GetFileAttributes cannot handle paths with trailing dir separator. The
current [l]stat implementation calls GetFileAttributes twice if the path
has trailing slashes (first with the original path passed to [l]stat, and
and a second time with a path copy with trailing '/' removed).

With Unicode conversion, we get the length of the path for free and also
have a (wide char) buffer that can be modified.

Remove trailing directory separators before calling the Win32 API.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:56 +01:00
Karsten Blees
d77efd807e lockfile.c: use is_dir_sep() instead of hardcoded '/' checks
Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:56 +01:00
Karsten Blees
b6d74403f7 strbuf_readlink: support link targets that exceed PATH_MAX
strbuf_readlink() refuses to read link targets that exceed PATH_MAX (even
if a sufficient size was specified by the caller).

As some platforms support longer paths, remove this restriction (similar
to strbuf_getcwd()).

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:56 +01:00
Karsten Blees
3327311b06 strbuf_readlink: don't call readlink twice if hint is the exact link size
strbuf_readlink() calls readlink() twice if the hint argument specifies the
exact size of the link target (e.g. by passing stat.st_size as returned by
lstat()). This is necessary because 'readlink(..., hint) == hint' could
mean that the buffer was too small.

Use hint + 1 as buffer size to prevent this.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:56 +01:00
Johannes Schindelin
17a46c5806 Merge branch 'maybe-drop'
These patches are probably no longer necessary. Make sure before
dropping them, though.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:56 +01:00
Johannes Schindelin
db42687c08 mingw: ensure valid CTYPE
A change between versions 2.4.1 and 2.6.0 of the MSYS2 runtime modified
how Cygwin's runtime (and hence Git for Windows' MSYS2 runtime
derivative) handles locales: d16a56306d (Consolidate wctomb/mbtowc calls
for POSIX-1.2008, 2016-07-20).

An unintended side-effect is that "cold-calling" into the POSIX
emulation will start with a locale based on the current code page,
something that Git for Windows is very ill-prepared for, as it expects
to be able to pass a command-line containing non-ASCII characters to the
shell without having those characters munged.

One symptom of this behavior: when `git clone` or `git fetch` shell out
to call `git-upload-pack` with a path that contains non-ASCII
characters, the shell tried to interpret the entire command-line
(including command-line parameters) as executable path, which obviously
must fail.

This fixes https://github.com/git-for-windows/git/issues/1036

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:56 +01:00
Johannes Schindelin
19883456c2 mingw: disable t9020
POSIX-to-Windows path mangling would make it fail. Symptoms:

	++ init_git
	++ rm -fr .git
	++ git init
	Initialized empty Git repository in [...]
	++ git remote add svnsim testsvn::sim:///usr/src/git/wip5/t/t9154/svn.dump
	++ git remote add svnfile testsvn::file:///usr/src/git/wip5/t/t9154/svn.dump
	++ git fetch svnsim
	progress Imported commit 1.
	fatal: Write to frontend failed: Bad file descriptor
	fast-import: dumping crash report to .git/fast_import_crash_23356
	fatal: error while running fast-import
	fatal: unexpected end of fast-import feedback
	error: last command exited with $?=128
	not ok 1 - simple fetch

Since the remote-svn project seems to be dormant at the moment (and not
complete enough to be used, which is a pity), let's just skip this test
on Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:56 +01:00
Johannes Schindelin
72f17a1e61 Unbreak interactive GPG prompt upon signing
With the recent update in efee955 (gpg-interface: check gpg signature
creation status, 2016-06-17), we ask GPG to send all status updates to
stderr, and then catch the stderr in an strbuf.

But GPG might fail, and send error messages to stderr. And we simply
do not show them to the user.

Even worse: this swallows any interactive prompt for a passphrase. And
detaches stderr from the tty so that the passphrase cannot be read.

So while the first problem could be fixed (by printing the captured
stderr upon error), the second problem cannot be easily fixed, and
presents a major regression.

So let's just revert commit efee9553a4.

This fixes https://github.com/git-for-windows/git/issues/871

Cc: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:56 +01:00
Johannes Schindelin
3269c242a3 mingw (git_terminal_prompt): do fall back to CONIN$/CONOUT$ method
To support Git Bash running in a MinTTY, we use a dirty trick to access
the MSYS2 pseudo terminal: we execute a Bash snippet that accesses
/dev/tty.

The idea was to fall back to writing to/reading from CONOUT$/CONIN$ if
that Bash call failed because Bash was not found.

However, we should fall back even in other error conditions, because we
have not successfully read the user input. Let's make it so.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:56 +01:00
Karsten Blees
c89acefbc4 compat/terminal.c: only use the Windows console if bash 'read -r' fails
Accessing the Windows console through the special CONIN$ / CONOUT$ devices
doesn't work properly for non-ASCII usernames an passwords.

It also doesn't work for terminal emulators that hide the native console
window (such as mintty), and 'TERM=xterm*' is not necessarily a reliable
indicator for such terminals.

The new shell_prompt() function, on the other hand, works fine for both
MSys1 and MSys2, in native console windows as well as mintty, and properly
supports Unicode. It just needs bash on the path (for 'read -s', which is
bash-specific).

On Windows, try to use the shell to read from the terminal. If that fails
with ENOENT (i.e. bash was not found), use CONIN/OUT as fallback.

Note: To test this, create a UTF-8 credential file with non-ASCII chars,
e.g. in git-bash: 'echo url=http://täst.com > cred.txt'. Then in git-cmd,
'git credential fill <cred.txt' works (shell version), while calling git
without the git-wrapper (i.e. 'mingw64\bin\git credential fill <cred.txt')
mangles non-ASCII chars in both console output and input.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:55 +01:00
Karsten Blees
5d04b9492e mingw: Support git_terminal_prompt with more terminals
The `git_terminal_prompt()` function expects the terminal window to be
attached to a Win32 Console. However, this is not the case with terminal
windows other than `cmd.exe`'s, e.g. with MSys2's own `mintty`.

Non-cmd terminals such as `mintty` still have to have a Win32 Console
to be proper console programs, but have to hide the Win32 Console to
be able to provide more flexibility (such as being resizeable not only
vertically but also horizontally). By writing to that Win32 Console,
`git_terminal_prompt()` manages only to send the prompt to nowhere and
to wait for input from a Console to which the user has no access.

This commit introduces a function specifically to support `mintty` -- or
other terminals that are compatible with MSys2's `/dev/tty` emulation. We
use the `TERM` environment variable as an indicator for that: if the value
starts with "xterm" (such as `mintty`'s "xterm_256color"), we prefer to
let `xterm_prompt()` handle the user interaction.

The most prominent user of `git_terminal_prompt()` is certainly
`git-remote-https.exe`. It is an interesting use case because both
`stdin` and `stdout` are redirected when Git calls said executable, yet
it still wants to access the terminal.

When running inside a `mintty`, the terminal is not accessible to the
`git-remote-https.exe` program, though, because it is a MinGW program
and the `mintty` terminal is not backed by a Win32 console.

To solve that problem, we simply call out to the shell -- which is an
*MSys2* program and can therefore access `/dev/tty`.

Helped-by: nalla <nalla@hamal.uberspace.de>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Johannes Schindelin
80f97a05d4 mingw: ensure that core.longPaths is handled *always*
A ton of Git commands simply do not read (or at least parse) the core.*
settings. This is not good, as Git for Windows relies on the
core.longPaths setting to be read quite early on.

So let's just make sure that all commands read the config and give
platform_core_config() a chance.

This patch teaches tons of Git commands to respect the config setting
`core.longPaths = true`, including `pack-refs`, thereby fixing
https://github.com/git-for-windows/git/issues/1218

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Karsten Blees
6ed8defe36 Win32: fix 'lstat("dir/")' with long paths
Use a suffciently large buffer to strip the trailing slash.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-10-30 11:55:55 +01:00
Karsten Blees
cffdc97523 Win32: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

[jes: adjusted test number to avoid conflicts, added support for
chdir(), etc]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Doug Kelly
1eef10b981 pack-objects (mingw): demonstrate a segmentation fault with large deltas
There is a problem in the way 9ac3f0e5b3 (pack-objects: fix
performance issues on packing large deltas, 2018-07-22) initializes that
mutex in the `packing_data` struct. The problem manifests in a
segmentation fault on Windows, when a mutex (AKA critical section) is
accessed without being initialized. (With pthreads, you apparently do
not really have to initialize them?)

This was reported in https://github.com/git-for-windows/git/issues/1839.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Johannes Schindelin
43b0ad42c7 clean: make use of FSCache
The `git clean` command needs to enumerate plenty of files and
directories, and can therefore benefit from the FSCache.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Johannes Schindelin
a717f01d8f fscache: implement an FSCache-aware is_mount_point()
When FSCache is active, we can cache the reparse tag and use it directly
to determine whether a path refers to an NTFS junction, without any
additional, costly I/O.

Note: this change only makes a difference with the next commit, which
will make use of the FSCache in `git clean` (contingent on
`core.fscache` set, of course).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Johannes Schindelin
2258e26613 fscache: remember the reparse tag for each entry
We will use this in the next commit to implement an FSCache-aware
version of is_mount_point().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Derrick Stolee
4ffbe1e0ec unpack-trees: enable fscache for sparse-checkout
When updating the skip-worktree bits in the index to align with new
values in a sparse-checkout file, Git scans the entire working
directory with lstat() calls. In a sparse-checkout, many of these
lstat() calls are for paths that do not exist.

Enable the fscache feature during this scan. Since enable_fscache()
calls nest, the disable_fscache() method decrements a counter and
would only clear the cache if that counter reaches zero.

In a local test of a repo with ~2.2 million paths, updating the index
with git read-tree -m -u HEAD with a sparse-checkout file containing
only /.gitattributes improved from 2-3 minutes to ~6 seconds.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
2019-10-30 11:55:55 +01:00
Ben Peart
0daa44e376 fscache: teach fscache to use NtQueryDirectoryFile
Using FindFirstFileExW() requires the OS to allocate a 64K buffer for each
directory and then free it when we call FindClose().  Update fscache to call
the underlying kernel API NtQueryDirectoryFile so that we can do the buffer
management ourselves.  That allows us to allocate a single buffer for the
lifetime of the cache and reuse it for each directory.

This change improves performance of 'git status' by 18% in a repo with ~200K
files and 30k folders.

Documentation for NtQueryDirectoryFile can be found at:

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/content/ntifs/nf-ntifs-ntquerydirectoryfile
https://docs.microsoft.com/en-us/windows/desktop/FileIO/file-attribute-constants
https://docs.microsoft.com/en-us/windows/desktop/fileio/reparse-point-tags

To determine if the specified directory is a symbolic link, inspect the
FileAttributes member to see if the FILE_ATTRIBUTE_REPARSE_POINT flag is
set. If so, EaSize will contain the reparse tag (this is a so far
undocumented feature, but confirmed by the NTFS developers). To
determine if the reparse point is a symbolic link (and not some other
form of reparse point), test whether the tag value equals the value
IO_REPARSE_TAG_SYMLINK.

The NtQueryDirectoryFile() call works best (and on Windows 8.1 and
earlier, it works *only*) with buffer sizes up to 64kB. Which is 32k
wide characters, so let's use that as our buffer size.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-10-30 11:55:55 +01:00
Ben Peart
3a57e13e5d fscache: make fscache_enable() thread safe
The recent change to make fscache thread specific relied on fscache_enable()
being called first from the primary thread before being called in parallel
from worker threads.  Make that more robust and protect it with a critical
section to avoid any issues.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:55 +01:00
Ben Peart
7cdc2359a7 fscache: teach fscache to use mempool
Now that the fscache is single threaded, take advantage of the mem_pool as
the allocator to significantly reduce the cost of allocations and frees.

With the reduced cost of free, in future patches, we can start freeing the
fscache at the end of commands instead of just leaking it.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:55 +01:00
Ben Peart
e09adc94b8 fscache: update fscache to be thread specific instead of global
The threading model for fscache has been to have a single, global cache.
This puts requirements on it to be thread safe so that callers like
preload-index can call it from multiple threads.  This was implemented
with a single mutex and completion events which introduces contention
between the calling threads.

Simplify the threading model by making fscache thread specific.  This allows
us to remove the global mutex and synchronization events entirely and instead
associate a fscache with every thread that requests one. This works well with
the current multi-threading which divides the cache entries into blocks with
a separate thread processing each block.

At the end of each worker thread, if there is a fscache on the primary
thread, merge the cached results from the worker into the primary thread
cache. This enables us to reuse the cache later especially when scanning for
untracked files.

In testing, this reduced the time spent in preload_index() by about 25% and
also reduced the CPU utilization significantly.  On a repo with ~200K files,
it reduced overall status times by ~12%.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:54 +01:00
Ben Peart
a86ad6d13d fscache: fscache takes an initial size
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.

Add a separate disable_fscache() macro to make the code clearer and easier
to read.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:54 +01:00
Ben Peart
50ca764289 mem_pool: add GIT_TRACE_MEMPOOL support
Add tracing around initializing and discarding mempools. In discard report
on the amount of memory unused in the current block to help tune setting
the initial_size.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:54 +01:00
Ben Peart
20e911d287 fscache: add fscache hit statistics
Track fscache hits and misses for lstat and opendir requests.  Reporting of
statistics is done when the cache is disabled for the last time and freed
and is only reported if GIT_TRACE_FSCACHE is set.

Sample output is:

11:33:11.836428 compat/win32/fscache.c:433 fscache: lstat 3775, opendir 263, total requests/misses 4052/269

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:54 +01:00
Ben Peart
509715728b At the end of the add command, disable and free the fscache so that we don't
leak the memory and so that we can dump the fscache statistics.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:54 +01:00
Ben Peart
2027a3d788 fscache: add GIT_TEST_FSCACHE support
Add support to fscache to enable running the entire test suite with the
fscache enabled.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2019-10-30 11:55:54 +01:00