This is retry of #1419.
I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.
git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.
Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333
Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136
I confirmed this patch passed all tests in t/ with core_fscache=1.
Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
This topic branch allows `add -p` and `add -i` with a large number of
files. It is kind of a hack that was never really meant to be
upstreamed. Let's see if we can do better in the built-in `add -p`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch teaches `git clean` to respect NTFS junctions and Unix
bind mounts: it will now stop at those boundaries.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch allows us to specify absolute paths without the drive
prefix e.g. when cloning.
Example:
C:\Users\me> git clone https://github.com/git/git \upstream-git
This will clone into a new directory C:\upstream-git, in line with how
Windows interprets absolute paths.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
To verify that the `clean` side of the `clean`/`smudge` filter code is
correct with regards to LLP64 (read: to ensure that `size_t` is used
instead of `unsigned long`), here is a test case using a trivial filter,
specifically _not_ writing anything to the object store to limit the
scope of the test case.
As in previous commits, the `big` file from previous test cases is
reused if available, to save setup time, otherwise re-generated.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
To complement the `--stdin` and `--literally` test cases that verify
that we can hash files larger than 4GB on 64-bit platforms using the
LLP64 data model, here is a test case that exercises `hash-object`
_without_ any options.
Just as before, we use the `big` file from the previous test case if it
exists to save on setup time, otherwise generate it.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Just like the `hash-object --literally` code path, the `--stdin` code
path also needs to use `size_t` instead of `unsigned long` to represent
memory sizes, otherwise it would cause problems on platforms using the
LLP64 data model (such as Windows).
To limit the scope of the test case, the object is explicitly not
written to the object store, nor are any filters applied.
The `big` file from the previous test case is reused to save setup time;
To avoid relying on that side effect, it is generated if it does not
exist (e.g. when running via `sh t1007-*.sh --long --run=1,41`).
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue walking the code path for the >4GB `hash-object --literally`
test to the hash algorithm step for LLP64 systems.
This patch lets the SHA1DC code use `size_t`, making it compatible with
LLP64 data models (as used e.g. by Windows).
The interested reader of this patch will note that we adjust the
signature of the `git_SHA1DCUpdate()` function without updating _any_
call site. This certainly puzzled at least one reviewer already, so here
is an explanation:
This function is never called directly, but always via the macro
`platform_SHA1_Update`, which is usually called via the macro
`git_SHA1_Update`. However, we never call `git_SHA1_Update()` directly
in `struct git_hash_algo`. Instead, we call `git_hash_sha1_update()`,
which is defined thusly:
static void git_hash_sha1_update(git_hash_ctx *ctx,
const void *data, size_t len)
{
git_SHA1_Update(&ctx->sha1, data, len);
}
i.e. it contains an implicit downcast from `size_t` to `unsigned long`
(before this here patch). With this patch, there is no downcast anymore.
With this patch, finally, the t1007-hash-object.sh "files over 4GB hash
literally" test case is fixed.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On LLP64 systems, such as Windows, the size of `long`, `int`, etc. is
only 32 bits (for backward compatibility). Git's use of `unsigned long`
for file memory sizes in many places, rather than size_t, limits the
handling of large files on LLP64 systems (commonly given as `>4GB`).
Provide a minimum test for handling a >4GB file. The `hash-object`
command, with the `--literally` and without `-w` option avoids
writing the object, either loose or packed. This avoids the code paths
hitting the `bigFileThreshold` config test code, the zlib code, and the
pack code.
Subsequent patches will walk the test's call chain, converting types to
`size_t` (which is larger in LLP64 data models) where appropriate.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For some reason, this test case was indented with 4 spaces instead of 1
horizontal tab. The other test cases in the same test script are fine.
Signed-off-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This change enhances `git commit --cleanup=scissors` by detecting
scissors lines ending in either LF (UNIX-style) or CR/LF (DOS-style).
Regression tests are included to specifically test for trailing
comments after a CR/LF-terminated scissors line.
Signed-off-by: Luke Bonanomi <lbonanomi@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The convention in Git project's shell scripts is to have white-space
_before_, but not _after_ the `>` (or `<`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When we commit the template directory as part of `make vcxproj`, the
`branches/` directory is not actually commited, as it is empty.
Two tests were not prepared for that situation.
This developer tried to get rid of the support for `.git/branches/` a
long time ago, but that effort did not bear fruit, so the best we can do
is work around in these here tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows wants to add `git.exe` to the users' `PATH`, without
cluttering the latter with unnecessary executables such as `wish.exe`.
To that end, it invented the concept of its "Git wrapper", i.e. a tiny
executable located in `C:\Program Files\Git\cmd\git.exe` (originally a
CMD script) whose sole purpose is to set up a couple of environment
variables and then spawn the _actual_ `git.exe` (which nowadays lives in
`C:\Program Files\Git\mingw64\bin\git.exe` for 64-bit, and the obvious
equivalent for 32-bit installations).
Currently, the following environment variables are set unless already
initialized:
- `MSYSTEM`, to make sure that the MSYS2 Bash and the MSYS2 Perl
interpreter behave as expected, and
- `PLINK_PROTOCOL`, to force PuTTY's `plink.exe` to use the SSH
protocol instead of Telnet,
- `PATH`, to make sure that the `bin` folder in the user's home
directory, as well as the `/mingw64/bin` and the `/usr/bin`
directories are included. The trick here is that the `/mingw64/bin/`
and `/usr/bin/` directories are relative to the top-level installation
directory of Git for Windows (which the included Bash interprets as
`/`, i.e. as the MSYS pseudo root directory).
Using the absence of `MSYSTEM` as a tell-tale, we can detect in
`git.exe` whether these environment variables have been initialized
properly. Therefore we can call `C:\Program Files\Git\mingw64\bin\git`
in-place after this change, without having to call Git through the Git
wrapper.
Obviously, above-mentioned directories must be _prepended_ to the `PATH`
variable, otherwise we risk picking up executables from unrelated Git
installations. We do that by constructing the new `PATH` value from
scratch, appending `$HOME/bin` (if `HOME` is set), then the MSYS2 system
directories, and then appending the original `PATH`.
Side note: this modification of the `PATH` variable is independent of
the modification necessary to reach the executables and scripts in
`/mingw64/libexec/git-core/`, i.e. the `GIT_EXEC_PATH`. That
modification is still performed by Git, elsewhere, long after making the
changes described above.
While we _still_ cannot simply hard-link `mingw64\bin\git.exe` to `cmd`
(because the former depends on a couple of `.dll` files that are only in
`mingw64\bin`, i.e. calling `...\cmd\git.exe` would fail to load due to
missing dependencies), at least we can now avoid that extra process of
running the Git wrapper (which then has to wait for the spawned
`git.exe` to finish) by calling `...\mingw64\bin\git.exe` directly, via
its absolute path.
Testing this is in Git's test suite tricky: we set up a "new" MSYS
pseudo-root and copy the `git.exe` file into the appropriate location,
then verify that `MSYSTEM` is set properly, and also that the `PATH` is
modified so that scripts can be found in `$HOME/bin`, `/mingw64/bin/`
and `/usr/bin/`.
This addresses https://github.com/git-for-windows/git/issues/2283
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
There is a Win32 API function to resolve symbolic links, and we can use
that instead of resolving them manually. Even better, this function also
resolves NTFS junction points (which are somewhat similar to bind
mounts).
This fixes https://github.com/git-for-windows/git/issues/2481.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
NTFS junctions are somewhat similar in spirit to Unix bind mounts: they
point to a different directory and are resolved by the filesystem
driver. As such, they appear to `lstat()` as if they are directories,
not as if they are symbolic links.
_Any_ user can create junctions, while symbolic links can only be
created by non-administrators in Developer Mode on Windows 10. Hence
NTFS junctions are much more common "in the wild" than NTFS symbolic
links.
It was reported in https://github.com/git-for-windows/git/issues/2481
that adding files via an absolute path that traverses an NTFS junction:
since 1e64d18 (mingw: do resolve symlinks in `getcwd()`), we resolve not
only symbolic links but also NTFS junctions when determining the
absolute path of the current directory. The same is not true for `git
add <file>`, where symbolic links are resolved in `<file>`, but not NTFS
junctions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When specifying an absolute path without a drive prefix, we convert that
path internally. Let's make sure that we handle that case properly, too
;-)
This fixes the command
git clone https://github.com/git-for-windows/git \G4W
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.
In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".
Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).
This addresses https://github.com/git-for-windows/git/issues/607
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, there are several categories of absolute paths. One such
category starts with a backslash and is implicitly relative to the
drive associated with the current working directory. Example:
c:
git clone https://github.com/git-for-windows/git \G4W
should clone into C:\G4W.
There is currently a problem with that, in that mingw_mktemp() does not
expect the _wmktemp() function to prefix the absolute path with the
drive prefix, and as a consequence, the resulting path does not fit into
the originally-passed string buffer. The symptom is a "Result too large"
error.
Reported by Juan Carlos Arevalo Baeza.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This happens only when the corresponding commits are not exported in
the current fast-export run. This can happen either when the relevant
commit is already marked, or when the commit is explicitly marked
as UNINTERESTING with a negative ref by another argument.
This breaks fast-export basec remote helpers.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
The "fetch.credentialsInUrl" configuration variable controls what
happens when a URL with embedded login credential is used.
* ds/credentials-in-url:
remote: create fetch.credentialsInUrl config
Updating the graft information invalidates the list of parents of
in-core commit objects that used to be in the graft file.
* jt/unparse-commit-upon-graft-change:
commit,shallow: unparse commits if grafts changed
In Git 2.36 we revamped the way how hooks are invoked. One change
that is end-user visible is that the output of a hook is no longer
directly connected to the standard output of "git" that spawns the
hook, which was noticed post release. This is getting corrected.
* ab/hooks-regression-fix:
hook API: fix v2.36.0 regression: hooks should be connected to a TTY
run-command: add an "ungroup" option to run_process_parallel()
"git -c diff.submodule=log range-diff" did not show anything for
submodules that changed in the ranges being compared, and
"git -c diff.submodule=diff range-diff" did not work correctly.
Fix this by including the "--submodule=short" output
unconditionally to be compared.
* pb/range-diff-with-submodule:
range-diff: show submodule changes irrespective of diff.submodule
A new bug() and BUG_if_bug() API is introduced to make it easier to
uniformly log "detect multiple bugs and abort in the end" pattern.
* ab/bug-if-bug:
cache-tree.c: use bug() and BUG_if_bug()
receive-pack: use bug() and BUG_if_bug()
parse-options.c: use optbug() instead of BUG() "opts" check
parse-options.c: use new bug() API for optbug()
usage.c: add a non-fatal bug() function to go with BUG()
common-main.c: move non-trace2 exit() behavior out of trace2.c
More fsmonitor--daemon.
* jh/builtin-fsmonitor-part3: (30 commits)
t7527: improve implicit shutdown testing in fsmonitor--daemon
fsmonitor--daemon: allow --super-prefix argument
t7527: test Unicode NFC/NFD handling on MacOS
t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
t/helper/hexdump: add helper to print hexdump of stdin
fsmonitor: on macOS also emit NFC spelling for NFD pathname
t7527: test FSMonitor on case insensitive+preserving file system
fsmonitor: never set CE_FSMONITOR_VALID on submodules
t/perf/p7527: add perf test for builtin FSMonitor
t7527: FSMonitor tests for directory moves
fsmonitor: optimize processing of directory events
fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
fsm-health-win32: force shutdown daemon if worktree root moves
fsm-health-win32: add polling framework to monitor daemon health
fsmonitor--daemon: stub in health thread
fsmonitor--daemon: rename listener thread related variables
fsmonitor--daemon: prepare for adding health thread
fsmonitor--daemon: cd out of worktree root
fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
unpack-trees: initialize fsmonitor_has_run_once in o->result
...
A misconfigured 'branch..remote' led to a bug in configuration
parsing.
* gc/zero-length-branch-config-fix:
remote.c: reject 0-length branch names
remote.c: don't BUG() on 0-length branch names
Rename .env_array member to .env in the child_process structure.
* ab/env-array:
run-command API users: use "env" not "env_array" in comments & names
run-command API: rename "env_array" to "env"
Test that "git config --show-scope" shows the "worktree" scope, and add
it to the list of scopes in Documentation/git-config.txt.
"git config --help" does not need to be updated because it already
mentions "worktree".
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A git subcommand like "git add -p" spawns a separate git process
while relaying its command line arguments. A pathspec with only
negative elements was mistakenly passed with an empty string, which
has been corrected.
* jc/all-negative-pathspec:
pathspec: correct an empty string used as a pathspec element
Implementation of "scalar diagnose" subcommand.
* js/scalar-diagnose:
scalar: teach `diagnose` to gather loose objects information
scalar: teach `diagnose` to gather packfile info
scalar diagnose: include disk space information
scalar: implement `scalar diagnose`
scalar: validate the optional enlistment argument
archive --add-virtual-file: allow paths containing colons
archive: optionally add "virtual" files
Update the GitHub workflow support to make it quicker to get to the
failing test.
* js/ci-github-workflow-markup:
ci: call `finalize_test_case_output` a little later
ci(github): mention where the full logs can be found
ci: use `--github-workflow-markup` in the GitHub workflow
ci(github): avoid printing test case preamble twice
ci(github): skip the logs of the successful test cases
ci: optionally mark up output in the GitHub workflow
ci/run-build-and-tests: add some structure to the GitHub workflow output
ci: make it easier to find failed tests' logs in the GitHub workflow
ci/run-build-and-tests: take a more high-level view
test(junit): avoid line feeds in XML attributes
tests: refactor --write-junit-xml code
ci: fix code style
Plug the memory leaks from the trickiest API of all, the revision
walker.
* ab/plug-leak-in-revisions: (27 commits)
revisions API: add a TODO for diff_free(&revs->diffopt)
revisions API: have release_revisions() release "topo_walk_info"
revisions API: have release_revisions() release "date_mode"
revisions API: call diff_free(&revs->pruning) in revisions_release()
revisions API: release "reflog_info" in release revisions()
revisions API: clear "boundary_commits" in release_revisions()
revisions API: have release_revisions() release "prune_data"
revisions API: have release_revisions() release "grep_filter"
revisions API: have release_revisions() release "filter"
revisions API: have release_revisions() release "cmdline"
revisions API: have release_revisions() release "mailmap"
revisions API: have release_revisions() release "commits"
revisions API users: use release_revisions() for "prune_data" users
revisions API users: use release_revisions() with UNLEAK()
revisions API users: use release_revisions() in builtin/log.c
revisions API users: use release_revisions() in http-push.c
revisions API users: add "goto cleanup" for release_revisions()
stash: always have the owner of "stash_info" free it
revisions API users: use release_revisions() needing REV_INFO_INIT
revision.[ch]: document and move code declared around "init"
...
Fix a regression reported[1] against f443246b9f (commit: convert
{pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22): Due to
using the run_process_parallel() API in the earlier 96e7225b31 (hook:
add 'run' subcommand, 2021-12-22) we'd capture the hook's stderr and
stdout, and thus lose the connection to the TTY in the case of
e.g. the "pre-commit" hook.
As a preceding commit notes GNU parallel's similar --ungroup option
also has it emit output faster. While we're unlikely to have hooks
that emit truly massive amounts of output (or where the performance
thereof matters) it's still informative to measure the overhead. In a
similar "seq" test we're now ~30% faster:
$ cat .git/hooks/seq-hook; git hyperfine -L rev origin/master,HEAD~0 -s 'make CFLAGS=-O3' './git hook run seq-hook'
#!/bin/sh
seq 100000000
Benchmark 1: ./git hook run seq-hook' in 'origin/master
Time (mean ± σ): 787.1 ms ± 13.6 ms [User: 701.6 ms, System: 534.4 ms]
Range (min … max): 773.2 ms … 806.3 ms 10 runs
Benchmark 2: ./git hook run seq-hook' in 'HEAD~0
Time (mean ± σ): 603.4 ms ± 1.6 ms [User: 573.1 ms, System: 30.3 ms]
Range (min … max): 601.0 ms … 606.2 ms 10 runs
Summary
'./git hook run seq-hook' in 'HEAD~0' ran
1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master'
1. https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/
Reported-by: Anthony Sottile <asottile@umich.edu>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
[jc: minor fix-up to tests for consistency]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend the parallel execution API added in c553c72eed (run-command:
add an asynchronous parallel child processor, 2015-12-15) to support a
mode where the stdout and stderr of the processes isn't captured and
output in a deterministic order, instead we'll leave it to the kernel
and stdio to sort it out.
This gives the API same functionality as GNU parallel's --ungroup
option. As we'll see in a subsequent commit the main reason to want
this is to support stdout and stderr being connected to the TTY in the
case of jobs=1, demonstrated here with GNU parallel:
$ parallel --ungroup 'test -t {} && echo TTY || echo NTTY' ::: 1 2
TTY
TTY
$ parallel 'test -t {} && echo TTY || echo NTTY' ::: 1 2
NTTY
NTTY
Another is as GNU parallel's documentation notes a potential for
optimization. As demonstrated in next commit our results with "git
hook run" will be similar, but generally speaking this shows that if
you want to run processes in parallel where the exact order isn't
important this can be a lot faster:
$ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null '
Benchmark 1: parallel seq ::: 10000000 >/dev/null
Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms]
Range (min … max): 212.3 ms … 230.5 ms 3 runs
Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null
Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms]
Range (min … max): 153.9 ms … 155.7 ms 3 runs
Summary
'parallel --ungroup seq ::: 10000000 >/dev/null ' ran
1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null '
A large part of the juggling in the API is to make the API safer for
its maintenance and consumers alike.
For the maintenance of the API we e.g. avoid malloc()-ing the
"pp->pfd", ensuring that SANITIZE=address and other similar tools will
catch any unexpected misuse.
For API consumers we take pains to never pass the non-NULL "out"
buffer to an API user that provided the "ungroup" option. The
resulting code in t/helper/test-run-command.c isn't typical of such a
user, i.e. they'd typically use one mode or the other, and would know
whether they'd provided "ungroup" or not.
We could also avoid the strbuf_init() for "buffered_output" by having
"struct parallel_processes" use a static PARALLEL_PROCESSES_INIT
initializer, but let's leave that cleanup for later.
Using a global "run_processes_parallel_ungroup" variable to enable
this option is rather nasty, but is being done here to produce as
minimal of a change as possible for a subsequent regression fix. This
change is extracted from a larger initial version[1] which ends up
with a better end-state for the API, but in doing so needed to modify
all existing callers of the API. Let's defer that for now, and
narrowly focus on what we need for fixing the regression in the
subsequent commit.
It's safe to do this with a global variable because:
A) hook.c is the only user of it that sets it to non-zero, and before
we'll get any other API users we'll refactor away this method of
passing in the option, i.e. re-roll [1].
B) Even if hook.c wasn't the only user we don't have callers of this
API that concurrently invoke this parallel process starting API
itself in parallel.
As noted above "A" && "B" are rather nasty, and we don't want to live
with those caveats long-term, but for now they should be an acceptable
compromise.
1. https://lore.kernel.org/git/cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After generating diffs for each range to be compared using a 'git log'
invocation, range-diff.c::read_patches looks for the "diff --git" header
in those diffs to recognize the beginning of a new change.
In a project with submodules, and with 'diff.submodule=log' set in the
config, this header is missing for the diff of a changed submodule, so
any submodule changes are quietly ignored in the range-diff.
When 'diff.submodule=diff' is set in the config, the "diff --git" header
is also missing for the submodule itself, but is shown for submodule
content changes, which can easily confuse 'git range-diff' and lead to
errors such as:
error: git apply: bad git-diff - inconsistent old filename on line 1
error: could not parse git header 'diff --git path/to/submodule/and/some/file/within
'
error: could not parse log for '@{u}..@{1}'
Force the submodule diff format to its default ("short") when invoking
'git log' to generate the patches for each range, such that submodule
changes are always detected.
Add a test, including an invocation with '--creation-factor=100' to
force the second commit in the range not to be considered a complete
rewrite, in order to verify we do indeed get the "short" format.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>