git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-03-26 03:51:40 -05:00

Author	SHA1	Message	Date
Ben Peart	9bf0ef5ff4	fscache: add fscache hit statistics Track fscache hits and misses for lstat and opendir requests. Reporting of statistics is done when the cache is disabled for the last time and freed and is only reported if GIT_TRACE_FSCACHE is set. Sample output is: 11:33:11.836428 compat/win32/fscache.c:433 fscache: lstat 3775, opendir 263, total requests/misses 4052/269 Signed-off-by: Ben Peart <benpeart@microsoft.com>	2022-04-05 08:14:36 -07:00
Ben Peart	41edf95d76	fscache: add GIT_TEST_FSCACHE support Add support to fscache to enable running the entire test suite with the fscache enabled. Signed-off-by: Ben Peart <benpeart@microsoft.com>	2022-04-05 08:14:36 -07:00
Ben Peart	05a361e171	status: disable and free fscache at the end of the status command At the end of the status command, disable and free the fscache so that we don't leak the memory and so that we can dump the fscache statistics. Signed-off-by: Ben Peart <benpeart@microsoft.com>	2022-04-05 08:14:36 -07:00
Ben Peart	d511c0775f	fscache: use FindFirstFileExW to avoid retrieving the short name Use FindFirstFileExW with FindExInfoBasic to avoid forcing NTFS to look up the short name. Also switch to a larger (64K vs 4K) buffer using FIND_FIRST_EX_LARGE_FETCH to minimize round trips to the kernel. In a repo with ~200K files, this drops warm cache status times from 3.19 seconds to 2.67 seconds for a 16% savings. Signed-off-by: Ben Peart <benpeart@microsoft.com>	2022-04-05 08:14:36 -07:00
Ben Peart	7158228227	Enable the filesystem cache (fscache) in refresh_index(). On file systems that support it, this can dramatically speed up operations like add, commit, describe, rebase, reset, rm that would otherwise have to lstat() every file to "re-match" the stat information in the index to that of the file system. On a synthetic repo with 1M files, "git reset" dropped from 52.02 seconds to 14.42 seconds for a savings of 72%. Signed-off-by: Ben Peart <benpeart@microsoft.com>	2022-04-05 08:14:36 -07:00
Takuto Ikuta	69d83a29bc	checkout.c: enable fscache for checkout again This is retry of #1419. I added flush_fscache macro to flush cached stats after disk writing with tests for regression reported in #1438 and #1442. git checkout checks each file path in sorted order, so cache flushing does not make performance worse unless we have large number of modified files in a directory containing many files. Using chromium repository, I tested `git checkout .` performance when I delete 10 files in different directories. With this patch: TotalSeconds: 4.307272 TotalSeconds: 4.4863595 TotalSeconds: 4.2975562 Avg: 4.36372923333333 Without this patch: TotalSeconds: 20.9705431 TotalSeconds: 22.4867685 TotalSeconds: 18.8968292 Avg: 20.7847136 I confirmed this patch passed all tests in t/ with core_fscache=1. Signed-off-by: Takuto Ikuta <tikuta@chromium.org>	2022-04-05 08:14:35 -07:00
Takuto Ikuta	3efac54264	fetch-pack.c: enable fscache for stats under .git/objects When I do git fetch, git call file stats under .git/objects for each refs. This takes time when there are many refs. By enabling fscache, git takes file stats by directory traversing and that improved the speed of fetch-pack for repository having large number of refs. In my windows workstation, this improves the time of `git fetch` for chromium repository like below. I took stats 3 times. * With this patch TotalSeconds: 9.9825165 TotalSeconds: 9.1862075 TotalSeconds: 10.1956256 Avg: 9.78811653333333 * Without this patch TotalSeconds: 15.8406702 TotalSeconds: 15.6248053 TotalSeconds: 15.2085938 Avg: 15.5580231 Signed-off-by: Takuto Ikuta <tikuta@chromium.org> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Jeff Hostetler	b19fa9283e	dir.c: regression fix for add_excludes with fscache Fix regression described in: https://github.com/git-for-windows/git/issues/1392 which was introduced in: `b2353379bb` Problem Symptoms ================ When the user has a .gitignore file that is a symlink, the fscache optimization introduced above caused the stat-data from the symlink, rather that of the target file, to be returned. Later when the ignore file was read, the buffer length did not match the stat.st_size field and we called die("cannot use <path> as an exclude file") Optimization Rationale ====================== The above optimization calls lstat() before open() primarily to ask fscache if the file exists. It gets the current stat-data as a side effect essentially for free (since we already have it in memory). If the file does not exist, it does not need to call open(). And since very few directories have .gitignore files, we can greatly reduce time spent in the filesystem. Discussion of Fix ================= The above optimization calls lstat() rather than stat() because the fscache only intercepts lstat() calls. Calls to stat() stay directed to the mingw_stat() completly bypassing fscache. Furthermore, calls to mingw_stat() always call {open, fstat, close} so that symlinks are properly dereferenced, which adds additional open/close calls on top of what the original code in dir.c is doing. Since the problem only manifests for symlinks, we add code to overwrite the stat-data when the path is a symlink. This preserves the effect of the performance gains provided by the fscache in the normal case. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-05 08:14:35 -07:00
Jeff Hostetler	b4c7a73a3a	fscache: make fscache_enabled() public Make fscache_enabled() function public rather than static. Remove unneeded fscache_is_enabled() function. Change is_fscache_enabled() macro to call fscache_enabled(). is_fscache_enabled() now takes a pathname so that the answer is more precise and mean "is fscache enabled for this pathname", since fscache only stores repo-relative paths and not absolute paths, we can avoid attempting lookups for absolute paths. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-05 08:14:35 -07:00
Jeff Hostetler	d7682ceab4	dir.c: make add_excludes aware of fscache during status Teach read_directory_recursive() and add_excludes() to be aware of optional fscache and avoid trying to open() and fstat() non-existant ".gitignore" files in every directory in the worktree. The current code in add_excludes() calls open() and then fstat() for a ".gitignore" file in each directory present in the worktree. Change that when fscache is enabled to call lstat() first and if present, call open(). This seems backwards because both lstat needs to do more work than fstat. But when fscache is enabled, fscache will already know if the .gitignore file exists and can completely avoid the IO calls. This works because of the lstat diversion to mingw_lstat when fscache is enabled. This reduced status times on a 350K file enlistment of the Windows repo on a NVMe SSD by 0.25 seconds. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-05 08:14:35 -07:00
Jeff Hostetler	7d4ef798e2	add: use preload-index and fscache for performance Teach "add" to use preload-index and fscache features to improve performance on very large repositories. During an "add", a call is made to run_diff_files() which calls check_remove() for each index-entry. This calls lstat(). On Windows, the fscache code intercepts the lstat() calls and builds a private cache using the FindFirst/FindNext routines, which are much faster. Somewhat independent of this, is the preload-index code which distributes some of the start-up costs across multiple threads. We need to keep the call to read_cache() before parsing the pathspecs (and hence cannot use the pathspecs to limit any preload) because parse_pathspec() is using the index to determine whether a pathspec is, in fact, in a submodule. If we would not read the index first, parse_pathspec() would not error out on a path that is inside a submodule, and t7400-submodule-basic.sh would fail with not ok 47 - do not add files from a submodule We still want the nice preload performance boost, though, so we simply call read_cache_preload(&pathspecs) after parsing the pathspecs. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Johannes Schindelin	067cc833aa	fscache: add a test for the dir-not-found optimization Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Jeff Hostetler	86c2a9d137	fscache: remember not-found directories Teach FSCACHE to remember "not found" directories. This is a performance optimization. FSCACHE is a performance optimization available for Windows. It intercepts Posix-style lstat() calls into an in-memory directory using FindFirst/FindNext. It improves performance on Windows by catching the first lstat() call in a directory, using FindFirst/ FindNext to read the list of files (and attribute data) for the entire directory into the cache, and short-cut subsequent lstat() calls in the same directory. This gives a major performance boost on Windows. However, it does not remember "not found" directories. When STATUS runs and there are missing directories, the lstat() interception fails to find the parent directory and simply return ENOENT for the file -- it does not remember that the FindFirst on the directory failed. Thus subsequent lstat() calls in the same directory, each re-attempt the FindFirst. This completely defeats any performance gains. This can be seen by doing a sparse-checkout on a large repo and then doing a read-tree to reset the skip-worktree bits and then running status. This change reduced status times for my very large repo by 60%. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Jeff Hostetler	4af11a603b	fscache: add key for GIT_TRACE_FSCACHE Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Karsten Blees	2d12221a42	fscache: load directories only once If multiple threads access a directory that is not yet in the cache, the directory will be loaded by each thread. Only one of the results is added to the cache, all others are leaked. This wastes performance and memory. On cache miss, add a future object to the cache to indicate that the directory is currently being loaded. Subsequent threads register themselves with the future object and wait. When the first thread has loaded the directory, it replaces the future object with the result and notifies waiting threads. Signed-off-by: Karsten Blees <blees@dcon.de>	2022-04-05 08:14:35 -07:00
Karsten Blees	be1eab22ee	mingw: add a cache below mingw's lstat and dirent implementations Checking the work tree status is quite slow on Windows, due to slow `lstat()` emulation (git calls `lstat()` once for each file in the index). Windows operating system APIs seem to be much better at scanning the status of entire directories than checking single files. Add an `lstat()` implementation that uses a cache for lstat data. Cache misses read the entire parent directory and add it to the cache. Subsequent `lstat()` calls for the same directory are served directly from the cache. Also implement `opendir()`/`readdir()`/`closedir()` so that they create and use directory listings in the cache. The cache doesn't track file system changes and doesn't plug into any modifying file APIs, so it has to be explicitly enabled for git functions that don't modify the working copy. Note: in an earlier version of this patch, the cache was always active and tracked file system changes via ReadDirectoryChangesW. However, this was much more complex and had negative impact on the performance of modifying git commands such as 'git checkout'. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Karsten Blees	f10e3ffd9f	add infrastructure for read-only file system level caches Add a macro to mark code sections that only read from the file system, along with a config option and documentation. This facilitates implementation of relatively simple file system level caches without the need to synchronize with the file system. Enable read-only sections for 'git status' and preload_index. Signed-off-by: Karsten Blees <blees@dcon.de>	2022-04-05 08:14:35 -07:00
Karsten Blees	152d9e0d46	Win32: make the lstat implementation pluggable Emulating the POSIX lstat API on Windows via GetFileAttributes[Ex] is quite slow. Windows operating system APIs seem to be much better at scanning the status of entire directories than checking single files. A caching implementation may improve performance by bulk-reading entire directories or reusing data obtained via opendir / readdir. Make the lstat implementation pluggable so that it can be switched at runtime, e.g. based on a config option. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Karsten Blees	6b6c9ae78c	mingw: make the dirent implementation pluggable Emulating the POSIX `dirent` API on Windows via `FindFirstFile()`/`FindNextFile()` is pretty staightforward, however, most of the information provided in the `WIN32_FIND_DATA` structure is thrown away in the process. A more sophisticated implementation may cache this data, e.g. for later reuse in calls to `lstat()`. Make the `dirent` implementation pluggable so that it can be switched at runtime, e.g. based on a config option. Define a base DIR structure with pointers to `readdir()`/`closedir()` that match the `opendir()` implementation (similar to vtable pointers in Object-Oriented Programming). Define `readdir()`/`closedir()` so that they call the function pointers in the `DIR` structure. This allows to choose the `opendir()` implementation on a call-by-call basis. Make the fixed-size `dirent.d_name` buffer a flex array, as `d_name` may be implementation specific (e.g. a caching implementation may allocate a `struct dirent` with _just_ the size needed to hold the `d_name` in question). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Karsten Blees	238138a91c	Win32: dirent.c: Move opendir down Move opendir down in preparation for the next patch. Signed-off-by: Karsten Blees <blees@dcon.de>	2022-04-05 08:14:35 -07:00
Karsten Blees	8f6a80a6b2	Win32: make FILETIME conversion functions public We will use them in the upcoming "FSCache" patches (to accelerate sequential lstat() calls). Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2022-04-05 08:14:35 -07:00
Johannes Schindelin	5d3b520414	Merge branch 'ready-for-upstream' This is the branch thicket of patches in Git for Windows that are considered ready for upstream. To keep them in a ready-to-submit shape, they are kept as close to the beginning of the branch thicket as possible.	2022-04-05 08:14:35 -07:00
Victoria Dye	47c70b836f	Merge branch 'ns-batch-fsync' into HEAD	2022-04-05 08:14:34 -07:00
Johannes Schindelin	467afe6621	Merge pull request #3533 from PhilipOakley/hashliteral_t Begin `unsigned long`->`size_t` conversion to support large files on Windows	2022-04-05 08:14:34 -07:00
Jeff Hostetler	ecee229955	Merge branch 'mark-v4-fsmonitor-experimental' into try-v4-fsmonitor	2022-04-05 08:14:34 -07:00
Johannes Schindelin	3dd2f446b5	Enable the built-in FSMonitor as an experimental feature If `feature.experimental` and `feature.manyFiles` are set and the user has not explicitly turned off the builtin FSMonitor, we now start the built-in FSMonitor by default. Only forcing it when UNSET matches the behavior of UPDATE_DEFAULT_BOOL() used for other repo settings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-05 08:14:33 -07:00
Victoria Dye	a28a3c3326	fsmonitor: reintroduce core.useBuiltinFSMonitor Reintroduce the 'core.useBuiltinFSMonitor' config setting (originally added in `0a756b2a25` (fsmonitor: config settings are repository-specific, 2021-03-05)) after its removal from the upstream version of FSMonitor. Upstream, the 'core.useBuiltinFSMonitor' setting was rendered obsolete by "overloading" the 'core.fsmonitor' setting to take a boolean value. However, several applications (e.g., 'scalar') utilize the original config setting, so it should be preserved for a deprecation period before complete removal: * if 'core.fsmonitor' is a boolean, the user is correctly using the new config syntax; do not use 'core.useBuiltinFSMonitor'. * if 'core.fsmonitor' is unspecified, use 'core.useBuiltinFSMonitor'. * if 'core.fsmonitor' is a path, override and use the builtin FSMonitor if 'core.useBuiltinFSMonitor' is 'true'; otherwise, use the FSMonitor hook indicated by the path. Additionally, for this deprecation period, advise users to switch to using 'core.fsmonitor' to specify their use of the builtin FSMonitor. Signed-off-by: Victoria Dye <vdye@github.com>	2022-04-04 16:22:55 -07:00
Neeraj Singh	88c95ab186	core.fsyncmethod: performance tests for batch mode Add basic performance tests for git commands that can add data to the object database. We cover: * git add * git stash * git update-index (via git stash) * git unpack-objects * git commit --all We cover all currently available fsync methods as well. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	406dc747a9	t/perf: add iteration setup mechanism to perf-lib Tests that affect the repo in stateful ways are easier to write if we can run setup steps outside of the measured portion of perf iteration. This change adds a "--setup 'setup-script'" parameter to test_perf. To make invocations easier to understand, I also moved the prerequisites to a new --prereq parameter. The setup facility will be used in the upcoming perf tests for batch mode, but it already helps in some existing tests, like t5302 and t7820. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	256e857d6a	core.fsyncmethod: tests for batch mode Add test cases to exercise batch mode for: * 'git add' * 'git stash' * 'git update-index' * 'git unpack-objects' These tests ensure that the added data winds up in the object database. In this change we introduce a new test helper lib-unique-files.sh. The goal of this library is to create a tree of files that have different oids from any other files that may have been created in the current test repo. This helps us avoid missing validation of an object being added due to it already being in the repo. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	0d1301dd99	test-lib-functions: add parsing helpers for ls-files and ls-tree Several tests use awk to parse OIDs from the output of 'git ls-files --stage' and 'git ls-tree'. Introduce helpers to centralize these uses of awk. Update t5317-pack-objects-filter-objects.sh to use the new ls-files helper so that it has some usages to review. Other updates are left for the future. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	ac60c1da84	core.fsync: use batch mode and sync loose objects by default on Windows Git for Windows has defaulted to core.fsyncObjectFiles=true since September 2017. We turn on syncing of loose object files with batch mode in upstream Git so that we can get broad coverage of the new code upstream. We don't actually do fsyncs in the most of the test suite, since GIT_TEST_FSYNC is set to 0. However, we do exercise all of the surrounding batch mode code since GIT_TEST_FSYNC merely makes the maybe_fsync wrapper always appear to succeed. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	238b1cf128	unpack-objects: use the bulk-checkin infrastructure The unpack-objects functionality is used by fetch, push, and fast-import to turn the transfered data into object database entries when there are fewer objects than the 'unpacklimit' setting. By enabling an odb-transaction when unpacking objects, we can take advantage of batched fsyncs. Here are some performance numbers to justify batch mode for unpack-objects, collected on a WSL2 Ubuntu VM. Fsync Mode \| Time for 90 objects (ms) ------------------------------------- Off \| 170 On,fsync \| 760 On,batch \| 230 Note that the default unpackLimit is 100 objects, so there's a 3x benefit in the worst case. The non-batch mode fsync scales linearly with the number of objects, so there are significant benefits even with smaller numbers of objects. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	34ec98ee7a	update-index: use the bulk-checkin infrastructure The update-index functionality is used internally by 'git stash push' to setup the internal stashed commit. This change enables odb-transactions for update-index infrastructure to speed up adding new objects to the object database by leveraging the batch fsync functionality. There is some risk with this change, since under batch fsync, the object files will be in a tmp-objdir until update-index is complete, so callers using the --stdin option will not see them until update-index is done. This risk is mitigated by not keeping an ODB transaction open around --stdin processing if in --verbose mode. Without --verbose mode, a caller feeding update-index via --stdin wouldn't know when update-index adds an object, event without an ODB transaction. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	744f9b005a	builtin/add: add ODB transaction around add_files_to_cache The add_files_to_cache function is invoked internally by builtin/commit.c and builtin/checkout.c for their flags that stage modified files before doing the larger operation. These commands can benefit from batched fsyncing. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	baaec243bf	cache-tree: use ODB transaction around writing a tree Take advantage of the odb transaction infrastructure around writing the cached tree to the object database. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	adf5285ad7	core.fsyncmethod: batched disk flushes for loose-objects When adding many objects to a repo with `core.fsync=loose-object`, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. This commit introduces a new `core.fsyncMethod=batch` option that batches up hardware flushes. It hooks into the bulk-checkin odb-transaction functionality, takes advantage of tmp-objdir, and uses the writeout-only support code. When the new mode is enabled, we do the following for each new object: 1a. Create the object in a tmp-objdir. 2a. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1b. Issue an fsync against a dummy file to flush the log and hardware writeback cache, which should by now have seen the tmp-objdir writes. 2b. Rename all of the tmp-objdir files to their final names. 3b. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This sequence also ensures that no object files appear in the main object store unless they are fsync-durable. Batch mode is only enabled if core.fsync includes loose-objects. If the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does not include loose-objects, we will use file-by-file fsyncing. In step (1a) of the sequence, the tmp-objdir is created lazily to avoid work if no loose objects are ever added to the ODB. We use a tmp-objdir to maintain the invariant that no loose-objects are visible in the main ODB unless they are properly fsync-durable. This is important since future ODB operations that try to create an object with specific contents will silently drop the new data if an object with the target hash exists without checking that the loose-object contents match the hash. Only a full git-fsck would restore the ODB to a functional state where dataloss doesn't occur. In step (1b) of the sequence, we issue a fsync against a dummy file created specifically for the purpose. This method has a little higher cost than using one of the input object files, but makes adding new callers of this mechanism easier, since we don't need to figure out which object file is "last" or risk sharing violations by caching the fd of the last object file. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. Adding 500 files to the repo with 'git add' Times reported in seconds. object file syncing \| Linux \| Mac \| Windows --------------------\|-------\|-------\|-------- disabled \| 0.06 \| 0.35 \| 0.61 fsync \| 1.88 \| 11.18 \| 2.47 batch \| 0.15 \| 0.41 \| 1.53 Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	e9bbf03409	bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' Make it clearer in the naming and documentation of the plug_bulk_checkin and unplug_bulk_checkin APIs that they can be thought of as a "transaction" to optimize operations on the object database. These transactions may be nested so that subsystems like the cache-tree writing code can optimize their operations without caring whether the top-level code has a transaction active. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Neeraj Singh	6840959be6	bulk-checkin: rename 'state' variable and separate 'plugged' boolean This commit prepares for adding batch-fsync to the bulk-checkin infrastructure. The bulk-checkin infrastructure is currently used to batch up addition of large blobs to a packfile. When a blob is larger than big_file_threshold, we unconditionally add it to a pack. If bulk checkins are 'plugged', we allow multiple large blobs to be added to a single pack until we reach the packfile size limit; otherwise, we simply make a new packfile for each large blob. The 'unplug' call tells us when the series of blob additions is done so that we can finish the packfiles and make their objects available to subsequent operations. Stated another way, bulk-checkin allows callers to define a transaction that adds multiple objects to the object database, where the object database can optimize its internal operations within the transaction boundary. Batched fsync will fit into bulk-checkin by taking advantage of the plug/unplug functionality to determine the appropriate time to fsync and make newly-added objects available in the primary object database. * Rename 'state' variable to 'bulk_checkin_state', since we will later be adding 'bulk_fsync_objdir'. This also makes the variable easier to find in the debugger, since the name is more unique. * Move the 'plugged' data member of 'bulk_checkin_state' into a separate static variable. Doing this avoids resetting the variable in finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we seem to unintentionally disable the plugging functionality the first time a new packfile must be created due to packfile size limits. While disabling the plugging state only results in suboptimal behavior for the current code, it would be fatal for the bulk-fsync functionality later in this patch series. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-04 13:40:10 -07:00
Jeff Hostetler	3a5235120b	Merge branch 'trial-fsmonitor-march-2022-part3' into trial-fsmonitor-march-2022	2022-04-04 12:28:52 -07:00
Jeff Hostetler	047b7f42de	t7527: test Unicode NFC/NFD handling on MacOS Confirm that the daemon reports events using the on-disk spelling for Unicode NFC/NFD characters. On APFS we still have Unicode aliasing, so we cannot create two files that only differ by NFC/NFD, but the on-disk format preserves the spelling used to create the file. On HFS+ we also have aliasing, but the path is always stored on disk in NFD. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	006595a88e	t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd Create a set of prereqs to help understand how file names are handled by the filesystem when they contain NFC and NFD Unicode characters. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	08b925fdf6	fsmonitor: on macOS also emit NFC spelling for NFD pathname Emit NFC or NFC and NFD spellings of pathnames on macOS. MacOS is Unicode composition insensitive, so NFC and NFD spellings are treated as aliases and collide. While the spelling of pathnames in filesystem events depends upon the underlying filesystem, such as APFS, HFS+ or FAT32, the OS enforces such collisions regardless of filesystem. Teach the daemon to always report the NFC spelling and to report the NFD spelling when stored in that format on the disk. This is slightly more general than "core.precomposeUnicode". Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	1d23328829	t7527: test FSMonitor on case insensitive+preserving file system Test that FS events from the OS are received using the preserved, on-disk spelling of files/directories rather than spelling used to make the change. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	90f9857709	fsmonitor: never set CE_FSMONITOR_VALID on submodules Never set CE_FSMONITOR_VALID on the cache-entry of submodule directories. During a client command like 'git status', we may need to recurse into each submodule to compute a status summary for the submodule. Since the purpose of the ce_flag is to let Git avoid scanning a cache-entry, setting the flag causes the recursive call to be avoided and we report incorrect (no status) for the submodule. We created an OS watch on the root directory of our working directory and we receive events for everything in the cone under it. When submodules are present inside our working directory, we receive events for both our repo (the super) and any subs within it. Since our index doesn't have any information for items within the submodules, we can't use those events. We could try to truncate the paths of those events back to the submodule boundary and mark the GITLINK as dirty, but that feels expensive since we would have to prefix compare every FS event that we receive against a list of submodule roots. And it still wouldn't be sufficient to correctly report status on the submodule, since we don't have any space in the cache-entry to cache the submodule's status (the 'SCMU' bits in porcelain V2 speak). That is, the CE_FSMONITOR_VALID bit just says that we don't need to scan/inspect it because we already know the answer -- it doesn't say that the item is clean -- and we don't have space in the cache-entry to store those answers. So we should always do the recursive scan. Therefore, we should never set the flag on GITLINK cache-entries. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	9473b67517	t/perf/p7527: add perf test for builtin FSMonitor Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	1c0b74e4e5	t7527: FSMonitor tests for directory moves Create unit tests to move a directory. Verify that `git status` gives the same result with and without FSMonitor enabled. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	d612def36c	fsmonitor: optimize processing of directory events Teach Git to perform binary search over the cache-entries for a directory notification and then linearly scan forward to find the immediate children. Previously, when the FSMonitor reported a modified directory Git would perform a linear search on the entire cache-entry array for all entries matching that directory prefix and invalidate them. Since the cache-entry array is already sorted, we can use a binary search to find the first matching entry and then only linearly walk forward and invalidate entries until the prefix changes. Also, the original code would invalidate anything having the same directory prefix. Since a directory event should only be received for items that are immediately within the directory (and not within sub-directories of it), only invalidate those entries and not the whole subtree. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	f8563129ce	fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed Teach the listener thread to shutdown the daemon if the spelling of the worktree root directory changes. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00
Jeff Hostetler	1ebe1cb298	fsm-health-win32: force shutdown daemon if worktree root moves Force shutdown fsmonitor daemon if the worktree root directory is moved, renamed, or deleted. Use Windows low-level GetFileInformationByHandle() to get and compare the Windows system unique ID for the directory with a cached version when we started up. This lets us detect the case where someone renames the directory that we are watching and then creates a new directory with the original pathname. This is important because we are listening to a named pipe for requests and they are stored in the Named Pipe File System (NPFS) which a kernel-resident pseudo filesystem not associated with the actual NTFS directory. For example, if the daemon was watching "~/foo/", it would have a directory-watch handle on that directory and a named-pipe handle for "//./pipe/...foo". Moving the directory to "~/bar/" does not invalidate the directory handle. (So the daemon would actually be watching "~/bar" but listening on "//./pipe/...foo". If the user then does "git init ~/foo" and causes another daemon to start, the first daemon will still have ownership of the pipe and the second daemon instance will fail to start. "git status" clients in "~/foo" will ask "//./pipe/...foo" about changes and the first daemon instance will tell them about "~/bar". This commit causes the first daemon to shutdown if the system unique ID for "~/foo" changes (changes from what it was when the daemon started). Shutdown occurs after a periodic poll. After the first daemon exits and releases the lock on the named pipe, subsequent Git commands may cause another daemon to be started on "~/foo". Similarly, a subsequent Git command may cause another daemon to be started on "~/bar". Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>	2022-04-04 12:28:51 -07:00

1 2 3 4 5 ...

130623 Commits