Commit Graph

97382 Commits

Author SHA1 Message Date
Takuto Ikuta
8604a61ba3 fetch-pack.c: enable fscache for stats under .git/objects
When I do git fetch, git call file stats under .git/objects for each
refs. This takes time when there are many refs.

By enabling fscache, git takes file stats by directory traversing and that
improved the speed of fetch-pack for repository having large number of
refs.

In my windows workstation, this improves the time of `git fetch` for
chromium repository like below. I took stats 3 times.

* With this patch
TotalSeconds: 9.9825165
TotalSeconds: 9.1862075
TotalSeconds: 10.1956256
Avg: 9.78811653333333

* Without this patch
TotalSeconds: 15.8406702
TotalSeconds: 15.6248053
TotalSeconds: 15.2085938
Avg: 15.5580231

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2019-05-13 22:56:50 +02:00
Jeff Hostetler
577c5a4f4d dir.c: regression fix for add_excludes with fscache
Fix regression described in:
https://github.com/git-for-windows/git/issues/1392

which was introduced in:
b2353379bb

Problem Symptoms
================
When the user has a .gitignore file that is a symlink, the fscache
optimization introduced above caused the stat-data from the symlink,
rather that of the target file, to be returned.  Later when the ignore
file was read, the buffer length did not match the stat.st_size field
and we called die("cannot use <path> as an exclude file")

Optimization Rationale
======================
The above optimization calls lstat() before open() primarily to ask
fscache if the file exists.  It gets the current stat-data as a side
effect essentially for free (since we already have it in memory).
If the file does not exist, it does not need to call open().  And
since very few directories have .gitignore files, we can greatly
reduce time spent in the filesystem.

Discussion of Fix
=================
The above optimization calls lstat() rather than stat() because the
fscache only intercepts lstat() calls.  Calls to stat() stay directed
to the mingw_stat() completly bypassing fscache.  Furthermore, calls
to mingw_stat() always call {open, fstat, close} so that symlinks are
properly dereferenced, which adds *additional* open/close calls on top
of what the original code in dir.c is doing.

Since the problem only manifests for symlinks, we add code to overwrite
the stat-data when the path is a symlink.  This preserves the effect of
the performance gains provided by the fscache in the normal case.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2019-05-13 22:56:50 +02:00
Jeff Hostetler
3cf373d2fb fscache: make fscache_enabled() public
Make fscache_enabled() function public rather than static.
Remove unneeded fscache_is_enabled() function.
Change is_fscache_enabled() macro to call fscache_enabled().

is_fscache_enabled() now takes a pathname so that the answer
is more precise and mean "is fscache enabled for this pathname",
since fscache only stores repo-relative paths and not absolute
paths, we can avoid attempting lookups for absolute paths.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2019-05-13 22:56:50 +02:00
Jeff Hostetler
23031711e8 dir.c: make add_excludes aware of fscache during status
Teach read_directory_recursive() and add_excludes() to
be aware of optional fscache and avoid trying to open()
and fstat() non-existant ".gitignore" files in every
directory in the worktree.

The current code in add_excludes() calls open() and then
fstat() for a ".gitignore" file in each directory present
in the worktree.  Change that when fscache is enabled to
call lstat() first and if present, call open().

This seems backwards because both lstat needs to do more
work than fstat.  But when fscache is enabled, fscache will
already know if the .gitignore file exists and can completely
avoid the IO calls.  This works because of the lstat diversion
to mingw_lstat when fscache is enabled.

This reduced status times on a 350K file enlistment of the
Windows repo on a NVMe SSD by 0.25 seconds.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2019-05-13 22:56:50 +02:00
Karsten Blees
942ef0de47 fscache: load directories only once
If multiple threads access a directory that is not yet in the cache, the
directory will be loaded by each thread. Only one of the results is added
to the cache, all others are leaked. This wastes performance and memory.

On cache miss, add a future object to the cache to indicate that the
directory is currently being loaded. Subsequent threads register themselves
with the future object and wait. When the first thread has loaded the
directory, it replaces the future object with the result and notifies
waiting threads.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-05-13 22:56:45 +02:00
Karsten Blees
f41d747b80 Win32: add a cache below mingw's lstat and dirent implementations
Checking the work tree status is quite slow on Windows, due to slow lstat
emulation (git calls lstat once for each file in the index). Windows
operating system APIs seem to be much better at scanning the status
of entire directories than checking single files.

Add an lstat implementation that uses a cache for lstat data. Cache misses
read the entire parent directory and add it to the cache. Subsequent lstat
calls for the same directory are served directly from the cache.

Also implement opendir / readdir / closedir so that they create and use
directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-05-13 22:56:45 +02:00
Karsten Blees
2919cb6778 add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-05-13 22:56:44 +02:00
Karsten Blees
c9df979aed Win32: make the lstat implementation pluggable
Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite
slow. Windows operating system APIs seem to be much better at scanning the
status of entire directories than checking single files. A caching
implementation may improve performance by bulk-reading entire directories
or reusing data obtained via opendir / readdir.

Make the lstat implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:44 +02:00
Karsten Blees
b772d0bd1a Win32: Make the dirent implementation pluggable
Emulating the POSIX dirent API on Windows via FindFirstFile/FindNextFile is
pretty staightforward, however, most of the information provided in the
WIN32_FIND_DATA structure is thrown away in the process. A more
sophisticated implementation may cache this data, e.g. for later reuse in
calls to lstat.

Make the dirent implementation pluggable so that it can be switched at
runtime, e.g. based on a config option.

Define a base DIR structure with pointers to readdir/closedir that match
the opendir implementation (i.e. similar to vtable pointers in OOP).
Define readdir/closedir so that they call the function pointers in the DIR
structure. This allows to choose the opendir implementation on a
call-by-call basis.

Move the fixed sized dirent.d_name buffer to the dirent-specific DIR
structure, as d_name may be implementation specific (e.g. a caching
implementation may just set d_name to point into the cache instead of
copying the entire file name string).

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-05-13 22:56:44 +02:00
Karsten Blees
6cce2b2966 Win32: dirent.c: Move opendir down
Move opendir down in preparation for the next patch.

Signed-off-by: Karsten Blees <blees@dcon.de>
2019-05-13 22:56:44 +02:00
Karsten Blees
157887c27b Win32: make FILETIME conversion functions public
We will use them in the upcoming "FSCache" patches (to accelerate
sequential lstat() calls).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:44 +02:00
Johannes Schindelin
d74aea3590 Merge branch 'address-coverity-reports'
Coverity pointed out a couple of bugs, and here are fixes for some of
them.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:42 +02:00
Johannes Schindelin
c9e71d9960 Merge pull request #2185 from dscho/fix-status-with-rebase-ir
Fix `git status`' display of `git rebase -ir`'s `label` commands
2019-05-13 22:56:42 +02:00
Johannes Schindelin
5cdba4600a Merge pull request #2127 from dscho/fix-fsmonitor
Do query the fsmonitor again after the index has been discarded
2019-05-13 22:56:42 +02:00
Johannes Schindelin
b6121bfd61 Merge branch 'difftool-no-index-extra'
This patch addresses the segmentation faults in `git difftool --no-index
--dir-diff`: surprisingly, those two options don't make no sense
together.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:41 +02:00
Johannes Schindelin
32735d7932 Merge branch 'difftool-no-index'
This fixes https://github.com/git-for-windows/git/issues/2123

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:41 +02:00
Johannes Schindelin
be6f5eb57f Merge pull request #2121 from dscho/fix-rereading-todo-list
rebase -i: fix re-reading the todo list when newly created objects are referenced
2019-05-13 22:56:41 +02:00
Johannes Schindelin
eb62c912ff Merge pull request #2182 from dscho/untracked-cache-off-by-one
Backport "untracked cache: fix off-by-one"
2019-05-13 22:56:41 +02:00
Johannes Schindelin
ebd23acd6e Merge pull request #2170 from dscho/gitk-long-cmdline
Fix gitk (long cmdline)
2019-05-13 22:56:41 +02:00
Johannes Schindelin
8100cee16c Merge pull request #2180 from dscho/t6500-and-msys2-runtime-v3.x
Prepare the gc tests for v3.x of the MSYS2 runtime
2019-05-13 22:56:40 +02:00
Johannes Schindelin
3badbc69dc Merge pull request #2160 from dscho/skip-t9822-on-apfs-gfw
Prepare our git-p4 tests for running on APFS; This is necessary to fix the CI builds since Azure Pipelines' macOS agents were upgraded to Mojave.
2019-05-13 22:56:40 +02:00
Johannes Schindelin
e8a8f2861d Merge pull request #2115 from dscho/gfw/msys2-3.x
mingw: allow building with an MSYS2 runtime v3.x
2019-05-13 22:56:40 +02:00
Johannes Schindelin
1ec6757da2 Merge pull request #2111 from gitgitgadget/jk/no-sigpipe-during-network-transport
Fix t5570 flakiness on macOS
2019-05-13 22:56:40 +02:00
Johannes Schindelin
269ac52c38 Merge pull request #2110 from dscho/avoid-find-in-makefile
Accelerate startup time of `make`
2019-05-13 22:56:40 +02:00
Johannes Schindelin
e7e650ed96 Merge pull request #2112 from dscho/gfw/rebase-am-and-orig-head
built-in rebase: pick up the ORIG_HEAD fix early
2019-05-13 22:56:39 +02:00
Johannes Schindelin
e0e6c1533f Merge pull request #2101 from yashb5/typo-gitattributes
gitattributes.txt: fix typo
2019-05-13 22:56:39 +02:00
Johannes Schindelin
011074d690 Merge branch 'fsync-object-files-always'
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:39 +02:00
Johannes Schindelin
b4d4224333 Merge branch 'spawn-with-spaces'
This topic branch conflicts with the next change that will change the
way we call `CreateProcessW()`. So let's merge it early, to avoid merge
conflicts during a merge (because we would have to resolve this with
every single merging-rebase).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:39 +02:00
Johannes Schindelin
936dedab34 Merge branch 'clean-long-paths'
This addresses https://github.com/git-for-windows/git/issues/521

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:39 +02:00
Johannes Schindelin
bfd75434a2 Merge branch 'mingw-home'
The environment variable `HOME` is not exactly a native concept on
Windows, but Git and its scripts rely heavily on it. Make sure that it
is set (using a default that is sensible in most cases, and can easily
be overridden by setting the user-wide environment variable `HOME`
explicitly, before starting Git).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:38 +02:00
Johannes Schindelin
97d076fbdf Merge branch 'gettext-force-utf-8-on-windows'
The idea of the C runtime on Windows as to what a locale is does not
mesh well with the idea Git has. So let's just ignore the C runtime.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:38 +02:00
Johannes Schindelin
6e18964f7e Merge branch 'mingw-avoid-illegal-filenames'
MSYS2 inherits the trick from Cygwin to pretend that filenames can
contain characters that are illegal on Windows (by mapping them to a
private Unicode page). As long as we stay safely within the MSYS2 realm
(Bash, GNU make, Perl) that is fine, so technically this change is not
needed. But it is a lot more elegant not to rely on this.

Besides, the suffix `.new` is a lot more intuitive than the suffix
`+`...

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:38 +02:00
Johannes Schindelin
2f6a9a6410 Merge branch 'mingw-stack-smashing-protector'
This is GCC's attempt at making things less predictable and thereby
reduce the attack surface for malware.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:38 +02:00
Johannes Schindelin
fe4dab9615 Merge branch 'mingw-manifest'
Windows executables can be configured to make use of certain Windows
features only via a so-called "manifest", i.e. a specific, embedded
resource. This manifest is also necessary to determine the Windows
version reliably.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:38 +02:00
Johannes Schindelin
5821566185 Merge branch 'msys2-htmldir'
Git for Windows ships with a subset of MSYS2, and tries to integrate
smoothly, so we want to install the documentation in the location that
is recommended by MSYS2.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:38 +02:00
Johannes Schindelin
900e2ef83f Merge branch 'munmap-before-ext-diff'
This topic branch fixes the usage pattern where files are still held
open with an exclusive lock when an external program is asked to open
those very same files.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:37 +02:00
Johannes Schindelin
7089566306 Merge pull request #1958 from dscho/ansi-unicode
mingw: safeguard against compiling with `-DUNICODE`
2019-05-13 22:56:37 +02:00
Johannes Schindelin
e60be5146f Merge branch 'program-data-config'
This branch introduces support for reading the "Windows-wide" Git
configuration from `%PROGRAMDATA%\Git\config`. As these settings are
intended to be shared between *all* Git-related software, that config
file takes an even lower precedence than `$(prefix)/etc/gitconfig`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:37 +02:00
Johannes Schindelin
e9e9d75d5a Merge branch 'no-perl-makemaker'
We no longer use MakeMaker, so let's not state in the MINGW section that
we do not want to use it...

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:37 +02:00
Johannes Schindelin
9d66e0812c Merge pull request #2148 from dscho/azure-pipelines-msvc
Let the MSVC build also be tested in the Azure Pipeline
2019-05-13 22:56:37 +02:00
Jeff Hostetler
b4c0acce65 Merge branch 'visual-studio'
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:36 +02:00
Jeff Hostetler
1169e1e82d Merge branch 'msvc'
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:36 +02:00
Johannes Schindelin
573dbb132b Merge remote-tracking branch 'dscho/add-p' into add-p-g4w
Let's test this for a while.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:36 +02:00
Johannes Schindelin
b279b3a1b2 Merge branch 'stash-p-corner-case'
This topic branch fixes a corner case that is amazingly common in this
developer's workflow: in a `git stash -p`, splitting a hunk and stashing
only part of it runs into a (known) bug where the partial hunk cannot be
applied in reverse.

It is one of those "good enough" fixes, not a full fix, though, as the
full fix would require a 3-way merge between `stash^` and the *worktree*
(not `HEAD`), with `stash` as merge base (i.e. a `git revert`, but on
top of the current worktree).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:28 +02:00
Johannes Schindelin
e5f6204b71 Merge branch 'add-p-in-c-config-settings'
This is the final leg of the journey to a fully built-in `git add`: the
`git add -i` and `git add -p` modes were re-implemented in C, but they
lacked support for a couple of config settings.

The one that sticks out most is the `interactive.singleKey` setting: it
was not only particularly hard to get to work, especially on Windows. It
is also the setting that seems to be incomplete already in the Perl
version: while the name suggests that it applies to the main loop of
`git add --interactive`, or to the file selections in that command, it
does not. Only the `git add --patch` mode respects that setting.

As it is outside the purpose of the conversion of
`git-add--interactive.perl` to C, we will leave that loose end for some
future date.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:28 +02:00
Johannes Schindelin
10b1eb3538 stash -p: (partially) fix bug concerning split hunks
When trying to stash part of the worktree changes by splitting a hunk
and then only partially accepting the split bits and pieces, the user
is presented with a rather cryptic error:

	error: patch failed: <file>:<line>
	error: test: patch does not apply
	Cannot remove worktree changes

and the command would fail to stash the desired parts of the worktree
changes (even if the `stash` ref was actually updated correctly).

We even have a test case demonstrating that failure, carrying it for
four years already.

The explanation: when splitting a hunk, the changed lines are no longer
separated by more than 3 lines (which is the amount of context lines
Git's diffs use by default), but less than that. So when staging only
part of the diff hunk for stashing, the resulting diff that we want to
apply to the worktree in reverse will contain those changes to be
dropped surrounded by three context lines, but since the diff is
relative to HEAD rather than to the worktree, these context lines will
not match.

Example time. Let's assume that the file README contains these lines:

	We
	the
	people

and the worktree added some lines so that it contains these lines
instead:

	We
	are
	the
	kind
	people

and the user tries to stash the line containing "are", then the command
will internally stage this line to a temporary index file and try to
revert the diff between HEAD and that index file. The diff hunk that
`git stash` tries to revert will look somewhat like this:

	@@ -1776,3 +1776,4
	 We
	+are
	 the
	 people

It is obvious, now, that the trailing context lines overlap with the
part of the original diff hunk that the user did *not* want to stash.

Keeping in mind that context lines in diffs serve the primary purpose of
finding the exact location when the diff does not apply precisely (but
when the exact line number in the file to be patched differs from the
line number indicated in the diff), we work around this by reducing the
amount of context lines: the diff was just generated.

Note: this is not a *full* fix for the issue. Just as demonstrated in
t3701's 'add -p works with pathological context lines' test case, there
are ambiguities in the diff format. It is very rare in practice, of
course, to encounter such repeated lines.

The full solution for such cases would be to replace the approach of
generating a diff from the stash and then applying it in reverse by
emulating `git revert` (i.e. doing a 3-way merge). However, in `git
stash -p` it would not apply to `HEAD` but instead to the worktree,
which makes this non-trivial to implement as long as we also maintain a
scripted version of `add -i`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:28 +02:00
Johannes Schindelin
67bcd45de1 Merge branch 'other-command-p-in-c'
At this stage on the journey to a fully built-in `git add`, we already
have everything we need, including the `--interactive` and `--patch`
options, as long as the `add.interactive.useBuiltin` setting is set to
`true` (kind of a "turned off feature flag", which it will be for a
while, until we get confident enough that the built-in version does the
job, and retire the Perl script).

However, the internal `add--interactive` helper is also used to back the
`--patch` option of `git stash`, `git reset` and `git checkout`.

This patch series brings them "online".

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:28 +02:00
Johannes Schindelin
e8650dcef5 ci: include the built-in git add -i in the linux-gcc job
This job runs the test suite twice, once in regular mode, and once with
a whole slew of `GIT_TEST_*` variables set.

Now that the built-in version of `git add --interactive` is
feature-complete, let's also throw `GIT_TEST_MULTI_PACK_INDEX` into that
fray.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:27 +02:00
Johannes Schindelin
01f25b9d01 t3904: fix incorrect demonstration of a bug
In 7e9e048661 (stash -p: demonstrate failure of split with mixed y/n,
2015-04-16), a regression test for a known breakage that was added to
the test script `t3904-stash-patch.sh` that demonstrated that splitting
a hunk and trying to stash only part of that split hunk fails (but
shouldn't).

As expected, it still fails, but for the wrong reason: once the bug is
fixed, we would expect stderr to show nothing, yet the regression test
expects stderr to show something.

Let's fix that by telling that regression test case to expect nothing to
be printed to stderr.

While at it, also drop the obvious left-over from debugging where the
regression test did not mind `git stash -p` to return a non-zero exit
status.

Of course, the regression test still fails, but this time for the
correct reason.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:27 +02:00
Johannes Schindelin
7e033a8853 built-in stash: use the built-in git add -p if so configured
The scripted version of `git stash` called directly into the Perl script
`git-add--interactive.perl`, and this was faithfully converted to C.

However, we have a much better way to do this now: call `git add
--patch=<mode>`, which incidentally also respects the config setting
`add.interactive.useBuiltin`.

Let's do this.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-05-13 22:56:27 +02:00