Commit Graph

16607 Commits

Author SHA1 Message Date
Johannes Schindelin
b1be8ef8da mingw: disable t9020
POSIX-to-Windows path mangling would make it fail. Symptoms:

	++ init_git
	++ rm -fr .git
	++ git init
	Initialized empty Git repository in [...]
	++ git remote add svnsim testsvn::sim:///usr/src/git/wip5/t/t9154/svn.dump
	++ git remote add svnfile testsvn::file:///usr/src/git/wip5/t/t9154/svn.dump
	++ git fetch svnsim
	progress Imported commit 1.
	fatal: Write to frontend failed: Bad file descriptor
	fast-import: dumping crash report to .git/fast_import_crash_23356
	fatal: error while running fast-import
	fatal: unexpected end of fast-import feedback
	error: last command exited with $?=128
	not ok 1 - simple fetch

Since the remote-svn project seems to be dormant at the moment (and not
complete enough to be used, which is a pity), let's just skip this test
on Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:02:52 +02:00
Johannes Schindelin
d679f6419c Unbreak interactive GPG prompt upon signing
With the recent update in efee955 (gpg-interface: check gpg signature
creation status, 2016-06-17), we ask GPG to send all status updates to
stderr, and then catch the stderr in an strbuf.

But GPG might fail, and send error messages to stderr. And we simply
do not show them to the user.

Even worse: this swallows any interactive prompt for a passphrase. And
detaches stderr from the tty so that the passphrase cannot be read.

So while the first problem could be fixed (by printing the captured
stderr upon error), the second problem cannot be easily fixed, and
presents a major regression.

So let's just revert commit efee9553a4.

This fixes https://github.com/git-for-windows/git/issues/871

Cc: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:02:52 +02:00
Karsten Blees
5265c41700 mingw: support long paths
Windows paths are typically limited to MAX_PATH = 260 characters, even
though the underlying NTFS file system supports paths up to 32,767 chars.
This limitation is also evident in Windows Explorer, cmd.exe and many
other applications (including IDEs).

Particularly annoying is that most Windows APIs return bogus error codes
if a relative path only barely exceeds MAX_PATH in conjunction with the
current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the
infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG.

Many Windows wide char APIs support longer than MAX_PATH paths through the
file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path.
Notable exceptions include functions dealing with executables and the
current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as
well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...).

Introduce a handle_long_path function to check the length of a specified
path properly (and fail with ENAMETOOLONG), and to optionally expand long
paths using the '\\?\' file namespace prefix. Short paths will not be
modified, so we don't need to worry about device names (NUL, CON, AUX).

Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be
limited to MAX_PATH (at least not on Win7), so we can use it to do the
heavy lifting of the conversion (translate '/' to '\', eliminate '.' and
'..', and make an absolute path).

Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH
limit.

Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs
that support long paths.

While improved error checking is always active, long paths support must be
explicitly enabled via 'core.longpaths' option. This is to prevent end
users to shoot themselves in the foot by checking out files that Windows
Explorer, cmd/bash or their favorite IDE cannot handle.

Test suite:
Test the case is when the full pathname length of a dir is close
to 260 (MAX_PATH).
Bug report and an original reproducer by Andrey Rogozhnikov:
https://github.com/msysgit/git/pull/122#issuecomment-43604199

[jes: adjusted test number to avoid conflicts, added support for
chdir(), etc]

Thanks-to: Martin W. Kirst <maki@bitkings.de>
Thanks-to: Doug Kelly <dougk.ff7@gmail.com>
Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:02:51 +02:00
Doug Kelly
c53164e636 pack-objects (mingw): demonstrate a segmentation fault with large deltas
There is a problem in the way 9ac3f0e5b3 (pack-objects: fix
performance issues on packing large deltas, 2018-07-22) initializes that
mutex in the `packing_data` struct. The problem manifests in a
segmentation fault on Windows, when a mutex (AKA critical section) is
accessed without being initialized. (With pthreads, you apparently do
not really have to initialize them?)

This was reported in https://github.com/git-for-windows/git/issues/1839.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:02:51 +02:00
Ben Peart
a0804b006c fscache: add GIT_TEST_FSCACHE support
Add support to fscache to enable running the entire test suite with the
fscache enabled.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2020-05-20 22:02:48 +02:00
Takuto Ikuta
74e2a4d98e checkout.c: enable fscache for checkout again
This is retry of #1419.

I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.

git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.

Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333

Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136

I confirmed this patch passed all tests in t/ with core_fscache=1.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2020-05-20 22:02:48 +02:00
Johannes Schindelin
8ad16a067f fscache: add a test for the dir-not-found optimization
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:02:46 +02:00
Johannes Schindelin
10c08879d7 Merge pull request #2618 from dscho/avoid-d/f-conflict-in-vs/master
ci: avoid d/f conflict in vs/master
2020-05-20 22:02:39 +02:00
Johannes Schindelin
2b9555bf26 Merge pull request #2506 from dscho/issue-2283
Allow running Git directly from `C:\Program Files\Git\mingw64\bin\git.exe`
2020-05-20 22:02:35 +02:00
Johannes Schindelin
88e9aa317d Merge pull request #2504 from dscho/access-repo-via-junction
Handle `git add <file>` where <file> traverses an NTFS junction
2020-05-20 22:02:34 +02:00
Johannes Schindelin
6c88d22e69 Merge branch 'dont-clean-junctions'
This topic branch teaches `git clean` to respect NTFS junctions and Unix
bind mounts: it will now stop at those boundaries.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:02:26 +02:00
Johannes Schindelin
67b712f320 Merge branch 'drive-prefix'
This topic branch allows us to specify absolute paths without the drive
prefix e.g. when cloning.

Example:

	C:\Users\me> git clone https://github.com/git/git \upstream-git

This will clone into a new directory C:\upstream-git, in line with how
Windows interprets absolute paths.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:02:22 +02:00
Johannes Schindelin
ff1e48222f t5505/t5516: fix white-space around redirectors
The convention in Git project's shell scripts is to have white-space
_before_, but not _after_ the `>` (or `<`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:09 +02:00
Johannes Schindelin
ce822fa3a4 t5505/t5516: allow running without .git/branches/ in the templates
When we commit the template directory as part of `make vcxproj`, the
`branches/` directory is not actually commited, as it is empty.

Two tests were not prepared for that situation.

This developer tried to get rid of the support for `.git/branches/` a
long time ago, but that effort did not bear fruit, so the best we can do
is work around in these here tests.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:09 +02:00
Johannes Schindelin
0cf62b4dae tests: exercise the RUNTIME_PREFIX feature
Originally, we refrained from adding a regression test in 7b6c649637
(system_path(): Add prefix computation at runtime if RUNTIME_PREFIX set,
2008-08-10), and in 226c0ddd0d (exec_cmd: RUNTIME_PREFIX on some POSIX
systems, 2018-04-10).

The reason was that it was deemed too tricky to test.

Turns out that it is not tricky to test at all: we simply create a
pseudo-root, copy the `git` executable into the `git/` subdirectory of
that pseudo-root, then copy a script into the `libexec/git-core/`
directory and expect that to be picked up.

As long as the trash directory is in a location where binaries can be
executed, this works.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:08 +02:00
Johannes Schindelin
a14ce435d9 mingw: implement a platform-specific strbuf_realpath()
There is a Win32 API function to resolve symbolic links, and we can use
that instead of resolving them manually. Even better, this function also
resolves NTFS junction points (which are somewhat similar to bind
mounts).

This fixes https://github.com/git-for-windows/git/issues/2481.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:08 +02:00
Johannes Schindelin
44cc635785 mingw: allow git.exe to be used instead of the "Git wrapper"
Git for Windows wants to add `git.exe` to the users' `PATH`, without
cluttering the latter with unnecessary executables such as `wish.exe`.
To that end, it invented the concept of its "Git wrapper", i.e. a tiny
executable located in `C:\Program Files\Git\cmd\git.exe` (originally a
CMD script) whose sole purpose is to set up a couple of environment
variables and then spawn the _actual_ `git.exe` (which nowadays lives in
`C:\Program Files\Git\mingw64\bin\git.exe` for 64-bit, and the obvious
equivalent for 32-bit installations).

Currently, the following environment variables are set unless already
initialized:

- `MSYSTEM`, to make sure that the MSYS2 Bash and the MSYS2 Perl
  interpreter behave as expected, and

- `PLINK_PROTOCOL`, to force PuTTY's `plink.exe` to use the SSH
  protocol instead of Telnet,

- `PATH`, to make sure that the `bin` folder in the user's home
  directory, as well as the `/mingw64/bin` and the `/usr/bin`
  directories are included. The trick here is that the `/mingw64/bin/`
  and `/usr/bin/` directories are relative to the top-level installation
  directory of Git for Windows (which the included Bash interprets as
  `/`, i.e. as the MSYS pseudo root directory).

Using the absence of `MSYSTEM` as a tell-tale, we can detect in
`git.exe` whether these environment variables have been initialized
properly. Therefore we can call `C:\Program Files\Git\mingw64\bin\git`
in-place after this change, without having to call Git through the Git
wrapper.

Obviously, above-mentioned directories must be _prepended_ to the `PATH`
variable, otherwise we risk picking up executables from unrelated Git
installations. We do that by constructing the new `PATH` value from
scratch, appending `$HOME/bin` (if `HOME` is set), then the MSYS2 system
directories, and then appending the original `PATH`.

Side note: this modification of the `PATH` variable is independent of
the modification necessary to reach the executables and scripts in
`/mingw64/libexec/git-core/`, i.e. the `GIT_EXEC_PATH`. That
modification is still performed by Git, elsewhere, long after making the
changes described above.

While we _still_ cannot simply hard-link `mingw64\bin\git.exe` to `cmd`
(because the former depends on a couple of `.dll` files that are only in
`mingw64\bin`, i.e. calling `...\cmd\git.exe` would fail to load due to
missing dependencies), at least we can now avoid that extra process of
running the Git wrapper (which then has to wait for the spawned
`git.exe` to finish) by calling `...\mingw64\bin\git.exe` directly, via
its absolute path.

Testing this is in Git's test suite tricky: we set up a "new" MSYS
pseudo-root and copy the `git.exe` file into the appropriate location,
then verify that `MSYSTEM` is set properly, and also that the `PATH` is
modified so that scripts can be found in `$HOME/bin`, `/mingw64/bin/`
and `/usr/bin/`.

This addresses https://github.com/git-for-windows/git/issues/2283

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:08 +02:00
Johannes Schindelin
afa18f2c1a mingw: demonstrate a git add issue with NTFS junctions
NTFS junctions are somewhat similar in spirit to Unix bind mounts: they
point to a different directory and are resolved by the filesystem
driver. As such, they appear to `lstat()` as if they are directories,
not as if they are symbolic links.

_Any_ user can create junctions, while symbolic links can only be
created by non-administrators in Developer Mode on Windows 10. Hence
NTFS junctions are much more common "in the wild" than NTFS symbolic
links.

It was reported in https://github.com/git-for-windows/git/issues/2481
that adding files via an absolute path that traverses an NTFS junction:
since 1e64d18 (mingw: do resolve symlinks in `getcwd()`), we resolve not
only symbolic links but also NTFS junctions when determining the
absolute path of the current directory. The same is not true for `git
add <file>`, where symbolic links are resolved in `<file>`, but not NTFS
junctions.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:08 +02:00
Johannes Schindelin
9ea3e657ba clean: remove mount points when possible
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:06 +02:00
Johannes Schindelin
10d2402b3c clean: do not traverse mount points
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.

In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".

Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).

This addresses https://github.com/git-for-windows/git/issues/607

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:06 +02:00
Johannes Schindelin
7385b9b081 mingw: allow absolute paths without drive prefix
When specifying an absolute path without a drive prefix, we convert that
path internally. Let's make sure that we handle that case properly, too
;-)

This fixes the command

	git clone https://github.com/git-for-windows/git \G4W

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:05 +02:00
Johannes Schindelin
2cdb5828ed mingw: demonstrate a problem with certain absolute paths
On Windows, there are several categories of absolute paths. One such
category starts with a backslash and is implicitly relative to the
drive associated with the current working directory. Example:

	c:
	git clone https://github.com/git-for-windows/git \G4W

should clone into C:\G4W.

There is currently a problem with that, in that mingw_mktemp() does not
expect the _wmktemp() function to prefix the absolute path with the
drive prefix, and as a consequence, the resulting path does not fit into
the originally-passed string buffer. The symptom is a "Result too large"
error.

Reported by Juan Carlos Arevalo Baeza.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2020-05-20 22:01:04 +02:00
Sverre Rabbelier
92e48d4faa remote-helper: check helper status after import/export
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
2020-05-20 22:01:04 +02:00
Sverre Rabbelier
0980be774f t9350: point out that refs are not updated correctly
This happens only when the corresponding commits are not exported in
the current fast-export run. This can happen either when the relevant
commit is already marked, or when the commit is explicitly marked
as UNINTERESTING with a negative ref by another argument.

This breaks fast-export basec remote helpers.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
2020-05-20 22:01:04 +02:00
Junio C Hamano
972ce8561d Merge branch 'jc/fix-tap-output-under-bash'
A recent attempt to make the test output nicer to view on CI
systems broke TAP output under bash.  The effort has been reverted
to be re-attempted in the next cycle.

* jc/fix-tap-output-under-bash:
  Revert "tests: when run in Bash, annotate test failures with file name/line number"
  Revert "ci: add a problem matcher for GitHub Actions"
  Revert "t/test_lib: avoid naked bash arrays in file_lineno"
2020-05-20 08:33:29 -07:00
Junio C Hamano
abbd1d9ebf Merge branch 'en/merge-rename-rename-worktree-fix'
When a binary file gets modified and renamed on both sides of history
to different locations, both files would be written to the working
tree but both would have the contents from "ours".  This has been
corrected so that the path from each side gets their original content.

* en/merge-rename-rename-worktree-fix:
  merge-recursive: fix rename/rename(1to2) for working tree with a binary
2020-05-20 08:33:27 -07:00
Junio C Hamano
74c6cba6d8 Merge branch 'dd/t1509-i18n-fix'
A few tests were not i18n clean.

* dd/t1509-i18n-fix:
  t1509: correct i18n test
2020-05-20 08:33:26 -07:00
Junio C Hamano
e31600b03f Revert "tests: when run in Bash, annotate test failures with file name/line number"
This reverts commit 662f9cf154,
to fix the TAP output broken for bash.
2020-05-15 10:25:58 -07:00
Junio C Hamano
3d7b2b4196 Revert "t/test_lib: avoid naked bash arrays in file_lineno"
This reverts commit 303775a25f0b4ac5d6ad2e96eb4404c24209cad8;
instead of trying to salvage the tap-breaking change, let's
revert the whole thing for now.
2020-05-15 09:47:18 -07:00
Junio C Hamano
d98abce68f Merge branch 'es/trace-log-progress'
Teach codepaths that show progress meter to also use the
start_progress() and the stop_progress() calls as a "region" to be
traced.

* es/trace-log-progress:
  trace2: log progress time and throughput
2020-05-14 14:39:45 -07:00
Junio C Hamano
ac140beebe Merge branch 'jt/t5500-unflake'
Test fix for a topic already in 'master' and meant for 'maint'.

* jt/t5500-unflake:
  t5500: count objects through stderr, not trace
2020-05-14 14:39:45 -07:00
Junio C Hamano
6baba94afc Merge branch 'sn/midx-repack-with-config'
"git multi-pack-index repack" has been taught to honor some
repack.* configuration variables.

* sn/midx-repack-with-config:
  multi-pack-index: respect repack.packKeptObjects=false
  midx: teach "git multi-pack-index repack" honor "git repack" configurations
2020-05-14 14:39:44 -07:00
Junio C Hamano
4b1e5e5d8c Merge branch 'ds/bloom-cleanup'
Code cleanup and typofixes

* ds/bloom-cleanup:
  completion: offer '--(no-)patch' among 'git log' options
  bloom: use num_changes not nr for limit detection
  bloom: de-duplicate directory entries
  Documentation: changed-path Bloom filters use byte words
  bloom: parse commit before computing filters
  test-bloom: fix usage typo
  bloom: fix whitespace around tab length
2020-05-14 14:39:44 -07:00
Junio C Hamano
0498840b35 Merge branch 'rs/fsck-duplicate-names-in-trees'
"git fsck" ensures that the paths recorded in tree objects are
sorted and without duplicates, but it failed to notice a case where
a blob is followed by entries that sort before a tree with the same
name.  This has been corrected.

* rs/fsck-duplicate-names-in-trees:
  fsck: report non-consecutive duplicate names in trees
2020-05-14 14:39:44 -07:00
Junio C Hamano
f4507cea24 Merge branch 'ao/p4-d-f-conflict-recover'
"git p4" learned to recover from a (broken) state where a directory
and a file are recorded at the same path in the Perforce repository
the same way as their clients do.

* ao/p4-d-f-conflict-recover:
  git-p4: recover from inconsistent perforce history
2020-05-14 14:39:43 -07:00
Junio C Hamano
a2a0942a16 Merge branch 'js/rebase-autosquash-double-fixup-fix'
"rebase -i" segfaulted when rearranging a sequence that has a
fix-up that applies another fix-up (which may or may not be a
fix-up of yet another step).

* js/rebase-autosquash-double-fixup-fix:
  rebase --autosquash: fix a potential segfault
2020-05-14 14:39:43 -07:00
Junio C Hamano
f9dbe28d62 Merge branch 'cw/bisect-replay-with-dos'
"git bisect replay" had trouble with input files when they used
CRLF line ending, which has been corrected.

* cw/bisect-replay-with-dos:
  bisect: allow CRLF line endings in "git bisect replay" input
2020-05-14 14:39:41 -07:00
Junio C Hamano
3583730758 Merge branch 'es/bugreport-with-hooks'
"git bugreport" learned to report enabled hooks in the repository.

* es/bugreport-with-hooks:
  bugreport: collect list of populated hooks
2020-05-14 14:39:41 -07:00
Elijah Newren
95983da6b4 merge-recursive: fix rename/rename(1to2) for working tree with a binary
With a rename/rename(1to2) conflict, we attempt to do a three-way merge
of the file contents, so that the correct contents can be placed in the
working tree at both paths.  If the file is a binary, however, no
content merging is possible and we should just use the original version
of the file at each of the paths.

Reported-by: Chunlin Zhang <zhangchunlin@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-14 12:14:19 -07:00
Junio C Hamano
a0125885f5 Merge branch 'cc/upload-pack-v2-fetch-fix'
Serving a "git fetch" client over "git://" and "ssh://" protocols
using the on-wire protocol version 2 was buggy on the server end
when the client needs to make a follow-up request to
e.g. auto-follow tags.

* cc/upload-pack-v2-fetch-fix:
  upload-pack: clear filter_options for each v2 fetch command
2020-05-13 12:19:21 -07:00
Junio C Hamano
2e72299ec6 Merge branch 'dd/bloom-sparse-fix'
Code clean-up.

* dd/bloom-sparse-fix:
  bloom: fix `make sparse` warning
2020-05-13 12:19:20 -07:00
Junio C Hamano
69ae8ffa2a Merge branch 'tb/bitmap-walk-with-tree-zero-filter'
The object walk with object filter "--filter=tree:0" can now take
advantage of the pack bitmap when available.

* tb/bitmap-walk-with-tree-zero-filter:
  pack-bitmap: pass object filter to fill-in traversal
  pack-bitmap.c: support 'tree:0' filtering
  pack-bitmap.c: make object filtering functions generic
  list-objects-filter: treat NULL filter_options as "disabled"
2020-05-13 12:19:18 -07:00
Đoàn Trần Công Danh
27e29f859d t1509: correct i18n test
git-init(1)'s messages is subjected to i18n.
They should be tested by test_i18n* family.

Fix them.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13 09:59:00 -07:00
Emily Shaffer
98a1364740 trace2: log progress time and throughput
Rather than teaching only one operation, like 'git fetch', how to write
down throughput to traces, we can learn about a wide range of user
operations that may seem slow by adding tooling to the progress library
itself. Operations which display progress are likely to be slow-running
and the kind of thing we want to monitor for performance anyways. By
showing object counts and data transfer size, we should be able to
make some derived measurements to ensure operations are scaling the way
we expect.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-12 15:30:39 -07:00
Derrick Stolee
2f6775f00c bloom: use num_changes not nr for limit detection
As diff_tree_oid() computes a diff, it will terminate early if the
total number of changed paths is strictly larger than max_changes.
This includes the directories that changed, not just the file paths.
However, only the file paths are reflected in the resulting diff
queue's "nr" value.

Use the "num_changes" from diffopt to check if the diff terminated
early. This is incredibly important, as it can result in incorrect
filters! For example, the first commit in the Linux kernel repo
reports only 471 changes, but since these are nested inside several
directories they expand to 513 "real" changes, and in fact the
total list of changes is not reported. Thus, the computed filter
for this commit is incorrect.

Demonstrate the subtle difference by using one fewer file change
in the 'get bloom filter for commit with 513 changes' test. Before,
this edited 513 files inside "bigDir" which hit this inequality.
However, dropping the file count by one demonstrates how the
previous inequality was incorrect but the new one is correct.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11 09:33:56 -07:00
Derrick Stolee
65c1a28bb6 bloom: de-duplicate directory entries
When computing a changed-path Bloom filter, we need to take the
files that changed from the diff computation and extract the parent
directories. That way, a directory pathspec such as "Documentation"
could match commits that change "Documentation/git.txt".

However, the current code does a poor job of this process. The paths
are added to a hashmap, but we do not check if an entry already
exists with that path. This can create many duplicate entries and
cause the filter to have a much larger length than it should. This
means that the filter is more sparse than intended, which helps the
false positive rate, but wastes a lot of space.

Properly use hashmap_get() before hashmap_add(). Also be sure to
include a comparison function so these can be matched correctly.

This has an effect on a test in t0095-bloom.sh. This makes sense,
there are ten changes inside "smallDir" so the total number of
paths in the filter should be 11. This would result in 11 * 10 bits
required, and with 8 bits per byte, this results in 14 bytes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11 09:33:56 -07:00
René Scharfe
9068cfb20f fsck: report non-consecutive duplicate names in trees
Tree entries are sorted in path order, meaning that directory names get
a slash ('/') appended implicitly.  Git fsck checks if trees contains
consecutive duplicates, but due to that ordering there can be
non-consecutive duplicates as well if one of them is a directory and the
other one isn't.  Such a tree cannot be fully checked out.

Find these duplicates by recording candidate file names on a stack and
check candidate directory names against that stack to find matches.

Suggested-by: Brandon Williams <bwilliamseng@gmail.com>
Original-test-by: Brandon Williams <bwilliamseng@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11 08:40:28 -07:00
Andrew Oakley
82e46d6b83 git-p4: recover from inconsistent perforce history
Perforce allows you commit files and directories with the same name,
so you could have files //depot/foo and //depot/foo/bar both checked
in.  A p4 sync of a repository in this state fails.  Deleting one of
the files recovers the repository.

When this happens we want git-p4 to recover in the same way as
perforce.

Note that Perforce has this change in their 2017.1 version:

     Bugs fixed in 2017.1
     #1489051 (Job #2170) **
        Submitting a file with the same name as an existing depot
        directory path (or vice versa) will now be rejected.

so people hopefully will not creating damaged Perforce repos
anymore, but "git p4" needs to be able to interact with already
corrupt ones.

Signed-off-by: Andrew Oakley <andrew@adoakley.name>
Reviewed-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-10 09:58:50 -07:00
Derrick Stolee
3ce4ca0a56 multi-pack-index: respect repack.packKeptObjects=false
When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-10 09:50:55 -07:00
Johannes Schindelin
02471e7e20 rebase --autosquash: fix a potential segfault
When rearranging the todo list so that the fixups/squashes are reordered
just after the commits they intend to fix up, we use two arrays to
maintain that list: `next` and `tail`.

The idea is that `next[i]`, if set to a non-negative value, contains the
index of the item that should be rearranged just after the `i`th item.

To avoid having to walk the entire `next` chain when appending another
fixup/squash, we also store the end of the `next` chain in `tail[i]`.

The logic we currently use to update these array items is based on the
assumption that given a fixup/squash item at index `i`, we just found
the index `i2` indicating the first item in that fixup chain.

However, as reported by Paul Ganssle, that need not be true: the special
form `fixup! <commit-hash>` is allowed to point to _another_ fixup
commit in the middle of the fixup chain.

Example:

	* 0192a To fixup
	* 02f12 fixup! To fixup
	* 03763 fixup! To fixup
	* 04ecb fixup! 02f12

Note how the fourth commit targets the second commit, which is already a
fixup that targets the first commit.

Previously, we would update `next` and `tail` under our assumption that
every `fixup!` commit would find the start of the `fixup!`/`squash!`
chain. This would lead to a segmentation fault because we would actually
end up with a `next[i]` pointing to a `fixup!` but the corresponding
`tail[i]` pointing nowhere, which would the lead to a segmentation
fault.

Let's fix this by _inserting_, rather than _appending_, the item. In
other words, if we make a given line successor of another line, we do
not simply forget any previously set successor of the latter, but make
it a successor of the former.

In the above example, at the point when we insert 04ecb just after
02f12, 03763 would already be recorded as a successor of 04ecb, and we
now "squeeze in" 04ecb.

To complete the idea, we now no longer assume that `next[i]` pointing to
a line means that `last[i]` points to a line, too. Instead, we extend
the concept of `last` to cover also partial `fixup!`/`squash!` chains,
i.e. chains starting in the middle of a larger such chain.

In the above example, after processing all lines, `last[0]`
(corresponding to 0192a) would point to 03763, which indeed is the end
of the overall `fixup!` chain, and `last[1]` (corresponding to 02f12)
would point to 04ecb (which is the last `fixup!` targeting 02f12, but it
has 03763 as successor, i.e. it is not the end of overall `fixup!`
chain).

Reported-by: Paul Ganssle <paul@ganssle.io>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-09 13:59:55 -07:00