This is retry of #1419.
I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.
git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.
Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333
Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136
I confirmed this patch passed all tests in t/ with core_fscache=1.
Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Teach "add" to use preload-index and fscache features
to improve performance on very large repositories.
During an "add", a call is made to run_diff_files()
which calls check_remove() for each index-entry. This
calls lstat(). On Windows, the fscache code intercepts
the lstat() calls and builds a private cache using the
FindFirst/FindNext routines, which are much faster.
Somewhat independent of this, is the preload-index code
which distributes some of the start-up costs across
multiple threads.
We need to keep the call to read_cache() before parsing the
pathspecs (and hence cannot use the pathspecs to limit any preload)
because parse_pathspec() is using the index to determine whether a
pathspec is, in fact, in a submodule. If we would not read the index
first, parse_pathspec() would not error out on a path that is inside
a submodule, and t7400-submodule-basic.sh would fail with
not ok 47 - do not add files from a submodule
We still want the nice preload performance boost, though, so we simply
call read_cache_preload(&pathspecs) after parsing the pathspecs.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.
This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.
Enable read-only sections for 'git status' and preload_index.
Signed-off-by: Karsten Blees <blees@dcon.de>
Teach fsmonitor--daemon client threads to create a cookie file
inside the .git directory and then wait until FS events for the
cookie are observed by the FS listener thread.
This helps address the racy nature of file system events by
blocking the client response until the kernel has drained any
event backlog.
This is especially important on MacOS where kernel events are
only issued with a limited frequency. See the `latency` argument
of `FSeventStreamCreate()`. The kernel only signals every `latency`
seconds, but does not guarantee that the kernel queue is completely
drained, so we may have to wait more than one interval. If we
increase the frequency, the system is more likely to drop events.
We avoid these issues by having each client thread create a unique
cookie file and then wait until it is seen in the event stream.
Co-authored-by: Kevin Willford <Kevin.Willford@microsoft.com>
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach fsmonitor--daemon to periodically truncate the list of
modified files to save some memory.
Clients will ask for the set of changes relative to a token that they
found in the FSMN index extension in the index. (This token is like a
point in time, but different). Clients will then update the index to
contain the response token (so that subsequent commands will be
relative to this new token).
Therefore, the daemon can gradually truncate the in-memory list of
changed paths as they become obsolete (older than the previous token).
Since we may have multiple clients making concurrent requests with a
skew of tokens and clients may be racing to the talk to the daemon,
we lazily truncate the list.
We introduce a 5 minute delay and truncate batches 5 minutes after
they are considered obsolete.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach fsmonitor--daemon to respond to IPC requests from client
Git processes and respond with a list of modified pathnames
relative to the provided token.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach fsmonitor--daemon to build a list of changed paths and associate
them with a token-id. This will be used by the platform-specific
backends to accumulate changed paths in response to filesystem events.
The platform-specific file system listener thread receives file system
events containing one or more changed pathnames (with whatever bucketing
or grouping that is convenient for the file system). These paths are
accumulated (without locking) by the file system layer into a `fsmonitor_batch`.
When the file system layer has drained the kernel event queue, it will
"publish" them to our token queue and make them visible to concurrent
client worker threads. The token layer is free to combine and/or de-dup
paths within these batches for efficient presentation to clients.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach fsmonitor--daemon to classify relative and absolute
pathnames and decide how they should be handled. This will
be used by the platform-specific backend to respond to each
filesystem event.
When we register for filesystem notifications on a directory,
we get events for everything (recursively) in the directory.
We want to report to clients changes to tracked and untracked
paths within the working directory. We do not want to report
changes within the .git directory, for example.
This classification will be used in a later commit by the
different backends to classify paths as events are received.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Implement `run` and `start` commands to try to
begin listening for file system events.
This version defines the thread structure with a single
fsmonitor_fs_listen thread to watch for file system events
and a simple IPC thread pool to wait for connections from
Git clients over a well-known named pipe or Unix domain
socket.
This version does not actually do anything yet because the
backends are still just stubs.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Implement `stop` and `status` client commands to control and query the
status of a `fsmonitor--daemon` server process (and implicitly start a
server process if necessary).
Later commits will implement the actual server and monitor the file
system.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Create a built-in file system monitoring daemon that can be used by
the existing `fsmonitor` feature (protocol API and index extension)
to improve the performance of various Git commands, such as `status`.
The `fsmonitor--daemon` feature builds upon the `Simple IPC` API and
provides an alternative to hook access to existing fsmonitors such
as `watchman`.
This commit merely adds the new command without any functionality.
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
This commit refactors `git_config_get_fsmonitor()` into the `repo_*()`
form that takes a parameter `struct repository *r`.
That change prepares for the upcoming `core.useBuiltinFSMonitor` flag which
will be stored in the `repo_settings` struct.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
We just introduced a helper to avoid showing a console window when the
scheduled task runs `git.exe`. Let's actually use it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.
In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".
Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).
This addresses https://github.com/git-for-windows/git/issues/607
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Fix a warning on AIX's xlc compiler that's been emitted since my
a1aad71601 (fsck.h: use "enum object_type" instead of "int",
2021-03-28):
"builtin/fsck.c", line 805.32: 1506-068 (W) Operation between
types "int(*)(struct object*,enum object_type,void*,struct
fsck_options*)" and "int(*)(struct object*,int,void*,struct
fsck_options*)" is not allowed.
I.e. it complains about us assigning a function with a prototype "int"
where we're expecting "enum object_type".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"ld" on Solaris fails to link some test helpers, which has been
worked around by reshuffling the inline function definitions from a
header file to a source file that is the only user of them.
* ab/pack-linkage-fix:
pack-objects: move static inline from a header to the sole consumer
Move the code that is only used in builtin/pack-objects.c out of
pack-objects.h.
This fixes an issue where Solaris's SunCC hasn't been able to compile
git since 483fa7f42d (t/helper/test-bitmap.c: initial commit,
2021-03-31).
The real origin of that issue is that in 898eba5e63 (pack-objects:
refer to delta objects by index instead of pointer, 2018-04-14)
utility functions only needed by builtin/pack-objects.c were added to
pack-objects.h. Since then the header has been used in a few other
places, but 483fa7f42d was the first time it was used by test helper.
Since Solaris is stricter about linking and the oe_get_size_slow()
function lives in builtin/pack-objects.c the build started failing
with:
Undefined first referenced
symbol in file
oe_get_size_slow t/helper/test-bitmap.o
ld: fatal: symbol referencing errors. No output written to t/helper/test-tool
On other platforms this is presumably OK because the compiler and/or
linker detects that the "static inline" functions that reference
oe_get_size_slow() aren't used.
Let's solve this by moving the relevant code from pack-objects.h to
builtin/pack-objects.c. This is almost entirely a code-only move, but
because of the early macro definitions in that file referencing some
of these inline functions we need to move the definition of "static
struct packing_data to_pack" earlier, and declare these inline
functions above the macros.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to read the init.templateDir setting at builtin/init-db.c using
a git_config() callback that, in turn, called git_config_pathname(). To
simplify the config reading logic at this file and plug a memory leak,
this was replaced by a direct call to git_config_get_value() at
e4de4502e6 ("init: remove git_init_db_config() while fixing leaks",
2021-03-14). However, this function doesn't provide path expanding
semantics, like git_config_pathname() does, so paths with '~/' and
'~user/' are treated literally. This makes 'git init' fail to handle
init.templateDir paths using these constructs:
$ git config init.templateDir '~/templates_dir'
$ git init
'warning: templates not found in ~/templates_dir'
Replace the git_config_get_value() call by git_config_get_pathname(),
which does the '~/' and '~user/' expansions. Also add a regression test.
Note that unlike git_config_get_value(), the config cache does not own
the memory for the path returned by git_config_get_pathname(), so we
must free() it.
Reported on IRC by rkta.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Another brown paper bag inconsistency fix for a new feature
introduced during this cycle.
* dl/stash-show-untracked-fixup:
stash show: use stash.showIncludeUntracked even when diff options given
The "rev-parse" command did not diagnose the lack of argument to
"--path-format" option, which was introduced in v2.31 era, which
has been corrected.
* wm/rev-parse-path-format-wo-arg:
rev-parse: fix segfault with missing --path-format argument
If options pertaining to how the diff is displayed is provided to
`git stash show`, the command will ignore the stash.showIncludeUntracked
configuration variable, defaulting to not showing any untracked files.
This is unintuitive behaviour since the format of the diff output and
whether or not to display untracked files are orthogonal.
Use stash.showIncludeUntracked even when diff options are given. Of
course, this is still overridable via the command-line options.
Update the documentation to explicitly say which configuration variables
will be overridden when a diff options are given.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clean" and "git ls-files -i" had confusion around working on
or showing ignored paths inside an ignored directory, which has
been corrected.
* en/dir-traversal:
dir: introduce readdir_skip_dot_and_dotdot() helper
dir: update stale description of treat_directory()
dir: traverse into untracked directories if they may have ignored subfiles
dir: avoid unnecessary traversal into ignored directory
t3001, t7300: add testcase showcasing missed directory traversal
t7300: add testcase showing unnecessary traversal into ignored directory
ls-files: error out on -i unless -o or -c are specified
dir: report number of visited directories and paths with trace2
dir: convert trace calls to trace2 equivalents
Calling "git rev-parse --path-format" without an argument segfaults
instead of giving an error message. Commit fac60b8925 (rev-parse: add
option for absolute or relative path formatting, 2020-12-13) added the
argument parsing code but forgot to handle NULL.
Returning an error makes sense here because there is no default value we
could use. Add a test case to verify.
Signed-off-by: Wolfgang Müller <wolf@oriole.systems>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to handle options recently added to "git stash show"
around untracked part of the stash segfaulted when these options
were used on a stash entry that does not record untracked part.
* dl/stash-show-untracked-fixup:
stash show: fix segfault with --{include,only}-untracked
t3905: correct test title
"git mailinfo" (hence "git am") learned the "--quoted-cr" option to
control how lines ending with CRLF wrapped in base64 or qp are
handled.
* dd/mailinfo-quoted-cr:
am: learn to process quoted lines that ends with CRLF
mailinfo: allow stripping quoted CR without warning
mailinfo: allow squelching quoted CRLF warning
mailinfo: warn if CRLF found in decoded base64/QP email
mailinfo: stop parsing options manually
mailinfo: load default metainfo_charset lazily
The final part of "parallel checkout".
* mt/parallel-checkout-part-3:
ci: run test round with parallel-checkout enabled
parallel-checkout: add tests related to .gitattributes
t0028: extract encoding helpers to lib-encoding.sh
parallel-checkout: add tests related to path collisions
parallel-checkout: add tests for basic operations
checkout-index: add parallel checkout support
builtin/checkout.c: complete parallel checkout support
make_transient_cache_entry(): optionally alloc from mem_pool
"git push" learns to discover common ancestor with the receiving
end over protocol v2.
* jt/push-negotiation:
send-pack: support push negotiation
fetch: teach independent negotiation (no packfile)
fetch-pack: refactor command and capability write
fetch-pack: refactor add_haves()
fetch-pack: refactor process_acks()
"git add -i --dry-run" does not dry-run, which was surprising. The
combination of options has taught to error out.
* ow/no-dryrun-in-add-i:
add: die if both --dry-run and --interactive are given
When `git stash show --include-untracked` or
`git stash show --only-untracked` is run on a stash that doesn't include
an untracked entry, a segfault occurs. This happens because we do not
check whether the untracked entry is actually present and just attempt
to blindly dereference it.
Ensure that the untracked entry is present before actually attempting to
dereference it.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many places in the code were doing
while ((d = readdir(dir)) != NULL) {
if (is_dot_or_dotdot(d->d_name))
continue;
...process d...
}
Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
...process d...
}
This helper particularly simplifies checks for empty directories.
Also use this helper in read_cached_dir() so that our statistics are
consistent across platforms. (In other words, read_cached_dir() should
have been using is_dot_or_dotdot() and skipping such entries, but did
not and left it to treat_path() to detect and mark such entries as
path_none.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ls-files --ignored can be used together with either --others or
--cached. After being perplexed for a bit and digging in to the code, I
assumed that ls-files -i was just broken and not printing anything and
I had a nice patch ready to submit when I finally realized that -i can be
used with --cached to find tracked ignores.
While that was a mistake on my part, and a careful reading of the
documentation could have made this more clear, I suspect this is an
error others are likely to make as well. In fact, of two uses in our
testsuite, I believe one of the two did make this error. In t1306.13,
there are NO tracked files, and all the excludes built up and used in
that test and in previous tests thus have to be about untracked files.
However, since they were looking for an empty result, the mistake went
unnoticed as their erroneous command also just happened to give an empty
answer.
-i will most the time be used with -o, which would suggest we could just
make -i imply -o in the absence of either a -o or -c, but that would be
a backward incompatible break. Instead, let's just flag -i without
either a -o or -c as an error, and update the two relevant testcases to
specify their intent.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes two memory leaks when running `git maintenance start` or `git
maintenance stop` in `update_background_schedule`:
$ valgrind --leak-check=full ~/git/bin/git maintenance start
==76584== Memcheck, a memory error detector
==76584== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==76584== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==76584== Command: /home/lenaic/git/bin/git maintenance start
==76584==
==76584==
==76584== HEAP SUMMARY:
==76584== in use at exit: 34,880 bytes in 252 blocks
==76584== total heap usage: 820 allocs, 568 frees, 146,414 bytes allocated
==76584==
==76584== 65 bytes in 1 blocks are definitely lost in loss record 17 of 39
==76584== at 0x483E6AF: malloc (vg_replace_malloc.c:306)
==76584== by 0x3DC39C: xrealloc (wrapper.c:126)
==76584== by 0x3992CC: strbuf_grow (strbuf.c:98)
==76584== by 0x39A473: strbuf_vaddf (strbuf.c:392)
==76584== by 0x39BC54: xstrvfmt (strbuf.c:979)
==76584== by 0x39BD2C: xstrfmt (strbuf.c:989)
==76584== by 0x18451B: update_background_schedule (gc.c:1977)
==76584== by 0x1846F6: maintenance_start (gc.c:2011)
==76584== by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584== by 0x127A2E: run_builtin (git.c:453)
==76584== by 0x127E81: handle_builtin (git.c:704)
==76584== by 0x128142: run_argv (git.c:771)
==76584==
==76584== 240 bytes in 1 blocks are definitely lost in loss record 29 of 39
==76584== at 0x4840D7B: realloc (vg_replace_malloc.c:834)
==76584== by 0x491CE5D: getdelim (in /usr/lib/libc-2.33.so)
==76584== by 0x39ADD7: strbuf_getwholeline (strbuf.c:635)
==76584== by 0x39AF31: strbuf_getdelim (strbuf.c:706)
==76584== by 0x39B064: strbuf_getline_lf (strbuf.c:727)
==76584== by 0x184273: crontab_update_schedule (gc.c:1919)
==76584== by 0x184678: update_background_schedule (gc.c:1997)
==76584== by 0x1846F6: maintenance_start (gc.c:2011)
==76584== by 0x1847B4: cmd_maintenance (gc.c:2030)
==76584== by 0x127A2E: run_builtin (git.c:453)
==76584== by 0x127E81: handle_builtin (git.c:704)
==76584== by 0x128142: run_argv (git.c:771)
==76584==
==76584== LEAK SUMMARY:
==76584== definitely lost: 305 bytes in 2 blocks
==76584== indirectly lost: 0 bytes in 0 blocks
==76584== possibly lost: 0 bytes in 0 blocks
==76584== still reachable: 34,575 bytes in 250 blocks
==76584== suppressed: 0 bytes in 0 blocks
==76584== Reachable blocks (those to which a pointer was found) are not shown.
==76584== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==76584==
==76584== For lists of detected and suppressed errors, rerun with: -s
==76584== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Signed-off-by: Lénaïc Huard <lenaic@lhuard.fr>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Options to "git pack-objects" that take numeric values like
--window and --depth should not accept negative values; the input
validation has been tightened.
* jk/pack-objects-negative-options-fix:
pack-objects: clamp negative depth to 0
t5316: check behavior of pack-objects --depth=0
pack-objects: clamp negative window size to 0
t5300: check that we produced expected number of deltas
t5300: modernize basic tests
A few variants of informational message "Already up-to-date" has
been rephrased.
* js/merge-already-up-to-date-message-reword:
merge: fix swapped "up to date" message components
merge(s): apply consistent punctuation to "up to date" messages
"git bisect skip" when custom words are used for new/old did not
work, which has been corrected.
* rj/bisect-skip-honor-terms:
bisect--helper: use BISECT_TERMS in 'bisect skip' command
"git repack -A -d" in a partial clone unnecessarily loosened
objects in promisor pack.
* rs/repack-without-loosening-promised-objects:
repack: avoid loosening promisor objects in partial clones
SHA-256 transition.
* bc/hash-transition-interop-part-1:
hex: print objects using the hash algorithm member
hex: default to the_hash_algo on zero algorithm value
builtin/pack-objects: avoid using struct object_id for pack hash
commit-graph: don't store file hashes as struct object_id
builtin/show-index: set the algorithm for object IDs
hash: provide per-algorithm null OIDs
hash: set, copy, and use algo field in struct object_id
builtin/pack-redundant: avoid casting buffers to struct object_id
Use the final_oid_fn to finalize hashing of object IDs
hash: add a function to finalize object IDs
http-push: set algorithm when reading object ID
Always use oidread to read into struct object_id
hash: add an algo member to struct object_id
In previous changes, mailinfo has learnt to process lines that decoded
from base64 or quoted-printable, and ends with CRLF.
Let's teach "am" that new trick, too.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous change, Git starts to warn for quoted CRLF in decoded
base64/QP email. Despite those warnings are usually helpful,
quoted CRLF could be part of some users' workflow.
Let's give them an option to turn off the warning completely.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plug various leans reported by LSAN.
* ah/plugleaks:
builtin/rm: avoid leaking pathspec and seen
builtin/rebase: release git_format_patch_opt too
builtin/for-each-ref: free filter and UNLEAK sorting.
mailinfo: also free strbuf lists when clearing mailinfo
builtin/checkout: clear pending objects after diffing
builtin/check-ignore: clear_pathspec before returning
builtin/bugreport: don't leak prefixed filename
branch: FREE_AND_NULL instead of NULL'ing real_ref
bloom: clear each bloom_key after use
ls-files: free max_prefix when done
wt-status: fix multiple small leaks
revision: free remainder of old commit list in limit_list
"git rev-list" learns the "--filter=object:type=<type>" option,
which can be used to exclude objects of the given kind from the
packfile generated by pack-objects.
* ps/rev-list-object-type-filter:
rev-list: allow filtering of provided items
pack-bitmap: implement combined filter
pack-bitmap: implement object type filter
list-objects: implement object type filter
list-objects: support filtering by tag and commit
list-objects: move tag processing into its own function
revision: mark commit parents as NOT_USER_GIVEN
uploadpack.txt: document implication of `uploadpackfilter.allow`
"git add" and "git rm" learned not to touch those paths that are
outside of sparse checkout.
* mt/add-rm-in-sparse-checkout:
rm: honor sparse checkout patterns
add: warn when asked to update SKIP_WORKTREE entries
refresh_index(): add flag to ignore SKIP_WORKTREE entries
pathspec: allow to ignore SKIP_WORKTREE entries on index matching
add: make --chmod and --renormalize honor sparse checkouts
t3705: add tests for `git add` in sparse checkouts
add: include magic part of pathspec on --refresh error