This topic branch re-adds the deprecated --stdin/-z options to `git
reset`. Those patches were overridden by a different set of options in
the upstream Git project before we could propose `--stdin`.
We offered this in MinGit to applications that wanted a safer way to
pass lots of pathspecs to Git, and these applications will need to be
adjusted.
Instead of `--stdin`, `--pathspec-from-file=-` should be used, and
instead of `-z`, `--pathspec-file-nul`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `--stdin` option was a well-established paradigm in other commands,
therefore we implemented it in `git reset` for use by Visual Studio.
Unfortunately, upstream Git decided that it is time to introduce
`--pathspec-from-file` instead.
To keep backwards-compatibility for some grace period, we therefore
reinstate the `--stdin` option on top of the `--pathspec-from-file`
option, but mark it firmly as deprecated.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, symbolic links actually have a type depending on the target:
it can be a file or a directory.
In certain circumstances, this poses problems, e.g. when a symbolic link
is supposed to point into a submodule that is not checked out, so there
is no way for Git to auto-detect the type.
To help with that, we will add support over the course of the next
commits to specify that symlink type via the Git attributes. This
requires an index_state, though, something that Git for Windows'
`symlink()` replacement cannot know about because the function signature
is defined by the POSIX standard and not ours to change.
So let's introduce a helper function to create symbolic links that
*does* know about the index_state.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, git repositories may have extra files which need cleaned
(e.g., a build directory) that may be arbitrarily deep. Suggest using
`core.longPaths` if such situations are encountered.
Fixes: #2715
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
The `git clean` command needs to enumerate plenty of files and
directories, and can therefore benefit from the FSCache.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.
Add a separate disable_fscache() macro to make the code clearer and easier
to read.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
At the end of the status command, disable and free the fscache so that we
don't leak the memory and so that we can dump the fscache statistics.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
This is retry of #1419.
I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.
git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.
Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333
Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136
I confirmed this patch passed all tests in t/ with core_fscache=1.
Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Teach "add" to use preload-index and fscache features
to improve performance on very large repositories.
During an "add", a call is made to run_diff_files()
which calls check_remove() for each index-entry. This
calls lstat(). On Windows, the fscache code intercepts
the lstat() calls and builds a private cache using the
FindFirst/FindNext routines, which are much faster.
Somewhat independent of this, is the preload-index code
which distributes some of the start-up costs across
multiple threads.
We need to keep the call to read_cache() before parsing the
pathspecs (and hence cannot use the pathspecs to limit any preload)
because parse_pathspec() is using the index to determine whether a
pathspec is, in fact, in a submodule. If we would not read the index
first, parse_pathspec() would not error out on a path that is inside
a submodule, and t7400-submodule-basic.sh would fail with
not ok 47 - do not add files from a submodule
We still want the nice preload performance boost, though, so we simply
call read_cache_preload(&pathspecs) after parsing the pathspecs.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.
This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.
Enable read-only sections for 'git status' and preload_index.
Signed-off-by: Karsten Blees <blees@dcon.de>
In Git for Windows, `has_symlinks` is set to 0 by default. Therefore, we
need to parse the config setting `core.symlinks` to know if it has been
set to `true`. In `git init`, we must do that before copying the
templates because they might contain symbolic links.
Even if the support for symbolic links on Windows has not made it to
upstream Git yet, we really should make sure that all the `core.*`
settings are parsed before proceeding, as they might very well change
the behavior of `git init` in a way the user intended.
This fixes https://github.com/git-for-windows/git/issues/3414
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We just introduced a helper to avoid showing a console window when the
scheduled task runs `git.exe`. Let's actually use it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.
In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".
Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).
This addresses https://github.com/git-for-windows/git/issues/607
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* maint-2.35:
Git 2.35.7
Git 2.34.7
http: support CURLOPT_PROTOCOLS_STR
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
Git 2.33.7
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.34:
Git 2.34.7
http: support CURLOPT_PROTOCOLS_STR
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
Git 2.33.7
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.33:
Git 2.33.7
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.32:
Git 2.32.6
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.31:
Git 2.31.7
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
* maint-2.30:
Git 2.30.8
apply: fix writing behind newly created symbolic links
dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
clone: delay picking a transport until after get_repo_path()
t5619: demonstrate clone_local() with ambiguous transport
In the previous commit, t5619 demonstrates an issue where two calls to
`get_repo_path()` could trick Git into using its local clone mechanism
in conjunction with a non-local transport.
That sequence is:
- the starting state is that the local path https:/example.com/foo is a
symlink that points to ../../../.git/modules/foo. So it's dangling.
- get_repo_path() sees that no such path exists (because it's
dangling), and thus we do not canonicalize it into an absolute path
- because we're using --separate-git-dir, we create .git/modules/foo.
Now our symlink is no longer dangling!
- we pass the url to transport_get(), which sees it as an https URL.
- we call get_repo_path() again, on the url. This second call was
introduced by f38aa83f9a (use local cloning if insteadOf makes a
local URL, 2014-07-17). The idea is that we want to pull the url
fresh from the remote.c API, because it will apply any aliases.
And of course now it sees that there is a local file, which is a
mismatch with the transport we already selected.
The issue in the above sequence is calling `transport_get()` before
deciding whether or not the repository is indeed local, and not passing
in an absolute path if it is local.
This is reminiscent of a similar bug report in [1], where it was
suggested to perform the `insteadOf` lookup earlier. Taking that
approach may not be as straightforward, since the intent is to store the
original URL in the config, but to actually fetch from the insteadOf
one, so conflating the two early on is a non-starter.
Note: we pass the path returned by `get_repo_path(remote->url[0])`,
which should be the same as `repo_name` (aside from any `insteadOf`
rewrites).
We *could* pass `absolute_pathdup()` of the same argument, which
86521acaca (Bring local clone's origin URL in line with that of a remote
clone, 2008-09-01) indicates may differ depending on the presence of
".git/" for a non-bare repo. That matters for forming relative submodule
paths, but doesn't matter for the second call, since we're just feeding
it to the transport code, which is fine either way.
[1]: https://lore.kernel.org/git/CAMoD=Bi41mB3QRn3JdZL-FGHs4w3C2jGpnJB-CqSndO7FMtfzA@mail.gmail.com/
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git prune" may try to iterate over .git/objects/pack for trash
files to remove in it, and loudly fail when the directory is
missing, which is not necessary. The command has been taught to
ignore such a failure.
* ew/prune-with-missing-objects-pack:
prune: quiet ENOENT on missing directories
"git receive-pack" used to use all the local refs as the boundary for
checking connectivity of the data "git push" sent, but now it uses
only the refs that it advertised to the pusher. In a repository with
the .hideRefs configuration, this reduces the resources needed to
perform the check.
cf. <221028.86bkpw805n.gmgdl@evledraar.gmail.com>
cf. <xmqqr0yrizqm.fsf@gitster.g>
* ps/receive-use-only-advertised:
receive-pack: only use visible refs for connectivity check
rev-parse: add `--exclude-hidden=` option
revision: add new parameter to exclude hidden refs
revision: introduce struct to handle exclusions
revision: move together exclusion-related functions
refs: get rid of global list of hidden refs
refs: fix memory leak when parsing hideRefs config
'git maintenance register' is taught to write configuration to an
arbitrary path, and 'git for-each-repo' is taught to expand tilde
characters in paths.
* rp/maintenance-qol:
builtin/gc.c: fix use-after-free in maintenance_unregister()
maintenance --unregister: fix uninit'd data use & -Wdeclaration-after-statement
maintenance: add option to register in a specific config
for-each-repo: interpolate repo path arguments
Fix a regression in the bisect-helper which mistakenly treats
arguments to the command given to 'git bisect run' as arguments to
the helper.
* dd/bisect-helper-subcommand:
bisect--helper: parse subcommand with OPT_SUBCOMMAND
bisect--helper: move all subcommands into their own functions
bisect--helper: remove unused options
Preparation to remove git-submodule.sh and replace it with a builtin.
* ab/submodule-helper-prep-only:
submodule--helper: use OPT_SUBCOMMAND() API
submodule--helper: drop "update --prefix <pfx>" for "-C <pfx> update"
submodule--helper: remove --prefix from "absorbgitdirs"
submodule API & "absorbgitdirs": remove "----recursive" option
submodule.c: refactor recursive block out of absorb function
submodule tests: test for a "foreach" blind-spot
submodule--helper: fix a memory leak in "status"
submodule tests: add tests for top-level flag output
submodule--helper: move "config" to a test-tool
$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories. Quiet down prune in cases where either
$GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
but emit the system error in other cases to help users diagnose
permissions problems or resource constraints.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply "index-compatibility.pending.cocci" rule to "builtin/*", but
exclude those where we conflict with in-flight changes.
As a result some of them end up using only "the_index", so let's have
them use the more narrow "USE_THE_INDEX_VARIABLE" rather than
"USE_THE_INDEX_COMPATIBILITY_MACROS".
Manual changes not made by coccinelle, that were squashed in:
* Whitespace-wrap argument lists for repo_hold_locked_index(),
repo_read_index_preload() and repo_refresh_and_write_index(), in cases
where the line became too long after the transformation.
* Change "refresh_cache()" to "refresh_index()" in a comment in
"builtin/update-index.c".
* For those whose call was followed by perror("<macro-name>"), change
it to perror("<function-name>"), referring to the new function.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split up the "USE_THE_INDEX_COMPATIBILITY_MACROS" into that setting
and a more narrow "USE_THE_INDEX_VARIABLE". In the case of these
built-ins we only need "the_index" variable, but not the compatibility
wrapper for functions we're not using.
Let's then have some users of "USE_THE_INDEX_COMPATIBILITY_MACROS" use
this more narrow and descriptive define.
For context: The USE_THE_INDEX_COMPATIBILITY_MACROS macro was added to
test-tool.h in f8adbec9fe (cache.h: flip
NO_THE_INDEX_COMPATIBILITY_MACROS switch, 2019-01-24).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mostly apply the part of "index-compatibility.pending.cocci" that
renames the global variables like "active_nr", which are a shorthand
to referencing (in that case) a struct member as "the_index.cache_nr".
In doing so move more of "index-compatibility.pending.cocci" to
"index-compatibility.cocci".
In the case of "active_nr" we'd have a textual conflict with
"ab/various-leak-fixes" in "next"[1]. Let's exclude that specific case
while moving the rule over from "pending".
1. 407b94280f8 (commit: discard partial cache before (re-)reading it,
2022-11-08)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply a selection of rules in "index-compatibility.pending.cocci"
tree-wide, and in doing so migrate them to
"index-compatibility.cocci".
As in preceding commits the only manual changes here are the macro
removals in "cache.h", and the update to the '*.cocci" rules. The rest
of the C code changes are the result of applying those updated rules.
Move rules for some rarely used cache compatibility macros from
"index-compatibility.pending.cocci" to "index-compatibility.cocci" and
apply them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The discard_index() function has not returned non-zero since
7a51ed66f6 (Make on-disk index representation separate from in-core
one, 2008-01-14), but we've had various code in-tree still acting as
though that might be the case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 4aab5b46f4 (Make read-cache.c "the_index" free., 2007-04-01)
we've been undergoing a slow migration away from these macros, but
haven't made much progress since f8adbec9fe (cache.h: flip
NO_THE_INDEX_COMPATIBILITY_MACROS switch, 2019-01-24).
Let's move forward a bit by changing the users of those macros that
are rare enough that we can convert them in one go, and then remove
the compatibility shim.
The only manual change to the C code here is to "cache.h", the rest is
all the result of applying the new "index-compatibility.cocci".
Even though it's a one-off, let's keep the coccinelle rules for
now. We'll extend them in subsequent commits, and this will help
anything that's in-flight or out-of-tree to migrate.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adding "USE_THE_INDEX_COMPATIBILITY_MACROS" to these two appears to
have been unnecessary from the start, as going back and compiling
f8adbec9fe (cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch,
2019-01-24) without that addition works.
Let's not have these ask for the compatibility macros from cache.h
that they don't need.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid calling 'cache_tree_update()' when doing so would be redundant.
* vd/skip-cache-tree-update:
rebase: use 'skip_cache_tree_update' option
read-tree: use 'skip_cache_tree_update' option
reset: use 'skip_cache_tree_update' option
unpack-trees: add 'skip_cache_tree_update' option
cache-tree: add perf test comparing update and prime
"git repack" learns to send cruft objects out of the way into
packfiles outside the repository.
* tb/repack-expire-to:
builtin/repack.c: implement `--expire-to` for storing pruned objects
builtin/repack.c: write cruft packs to arbitrary locations
builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`
builtin/repack.c: pass "out" to `prepare_pack_objects`
When serving a push, git-receive-pack(1) needs to verify that the
packfile sent by the client contains all objects that are required by
the updated references. This connectivity check works by marking all
preexisting references as uninteresting and using the new reference tips
as starting point for a graph walk.
Marking all preexisting references as uninteresting can be a problem
when it comes to performance. Git forges tend to do internal bookkeeping
to keep alive sets of objects for internal use or make them easy to find
via certain references. These references are typically hidden away from
the user so that they are neither advertised nor writeable. At GitLab,
we have one particular repository that contains a total of 7 million
references, of which 6.8 million are indeed internal references. With
the current connectivity check we are forced to load all these
references in order to mark them as uninteresting, and this alone takes
around 15 seconds to compute.
We can optimize this by only taking into account the set of visible refs
when marking objects as uninteresting. This means that we may now walk
more objects until we hit any object that is marked as uninteresting.
But it is rather unlikely that clients send objects that make large
parts of objects reachable that have previously only ever been hidden,
whereas the common case is to push incremental changes that build on top
of the visible object graph.
This provides a huge boost to performance in the mentioned repository,
where the vast majority of its refs hidden. Pushing a new commit into
this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs
as it is configured in Gitaly leads to a 4.5-fold speedup:
Benchmark 1: main
Time (mean ± σ): 30.977 s ± 0.157 s [User: 30.226 s, System: 1.083 s]
Range (min … max): 30.796 s … 31.071 s 3 runs
Benchmark 2: pks-connectivity-check-hide-refs
Time (mean ± σ): 6.799 s ± 0.063 s [User: 6.803 s, System: 0.354 s]
Range (min … max): 6.729 s … 6.850 s 3 runs
Summary
'pks-connectivity-check-hide-refs' ran
4.56 ± 0.05 times faster than 'main'
As we mostly go through the same codepaths even in the case where there
are no hidden refs at all compared to the code before there is no change
in performance when no refs are hidden:
Benchmark 1: main
Time (mean ± σ): 48.188 s ± 0.432 s [User: 49.326 s, System: 5.009 s]
Range (min … max): 47.706 s … 48.539 s 3 runs
Benchmark 2: pks-connectivity-check-hide-refs
Time (mean ± σ): 48.027 s ± 0.500 s [User: 48.934 s, System: 5.025 s]
Range (min … max): 47.504 s … 48.500 s 3 runs
Summary
'pks-connectivity-check-hide-refs' ran
1.00 ± 0.01 times faster than 'main'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>