To avoid running into command line limitations, some of Git's commands
support the `--stdin` option.
Let's use exactly this option in the three rev-list/log invocations in
gitk that would otherwise possibly run the danger of trying to invoke a
too-long command line.
While it is easy to redirect either stdin or stdout in Tcl/Tk scripts,
what we need here is both. We need to capture the output, yet we also
need to pipe in the revs/files arguments via stdin (because stdin does
not have any limit, unlike the command line). To help this, we use the
neat Tcl feature where you can capture stdout and at the same time feed
a fixed string as stdin to the spawned process.
One non-obvious aspect about this change is that the `--stdin` option
allows to specify revs, the double-dash, and files, but *no* other
options such as `--not`. This is addressed by prefixing the "negative"
revs with `^` explicitly rather than relying on the `--not` option
(thanks for coming up with that idea, Max!).
This fixes https://github.com/git-for-windows/git/issues/1987
Analysis-and-initial-patch-by: Max Kirillov <max@max630.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
"git pull" with any strategy when the other side is behind us
should succeed as it is a no-op, but doesn't.
* ev/pull-already-up-to-date-is-noop:
pull: should be noop when already-up-to-date
"git grep" looking in a blob that has non-UTF8 payload was
completely broken when linked with versions of PCREv2 library older
than 10.34 in the latest release.
* hm/paint-hits-in-log-grep:
Revert "grep/pcre2: fix an edge case concerning ascii patterns and UTF-8 data"
This reverts commit f6526728f9.
The change in f652672 (dir: select directories correctly, 2021-09-24)
caused a regression in directory-based matches with non-cone-mode
patterns, especially for .gitignore patterns. A test is included to
prevent this regression in the future.
The commit ed495847 (dir: fix pattern matching on dirs, 2021-09-24) was
reverted in 5ceb663 (dir: fix directory-matching bug, 2021-11-02) for
similar reasons. Neither commit changed tests, and tests added later in
the series continue to pass when these commits are reverted.
Reported-by: Danial Alihosseini <danial.alihosseini@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit ae39ba431a, as it
breaks "grep" when looking for a string in non UTF-8 haystack, when
linked with certain versions of PCREv2 library.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The already-up-to-date pull bug was fixed for --ff-only but it did not
include the case where --ff or --ff-only are not specified. This updates
the --ff-only fix to include the case where --ff or --ff-only are not
specified in command line flags or config.
Signed-off-by: Erwin Villejo <erwin.villejo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The filter system allows for alterations to file contents when they're
moved between the database and the worktree. We already made sure that
it is possible for smudge filters to produce contents that are larger
than `unsigned long` can represent (which matters on systems where
`unsigned long` is narrower than `size_t`, most notably 64-bit Windows).
Now we make sure that clean filters can _consume_ contents that are
larger than that.
Note that this commit only allows clean filters' _input_ to be larger
than can be represented by `unsigned long`.
This change makes only a very minute dent into the much larger project
to teach Git to use `size_t` instead of `unsigned long` wherever
appropriate.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This introduces an additional guard for platforms where `unsigned long`
and `size_t` are not of the same size. If the size of an object in the
database would overflow `unsigned long`, instead we now exit with an
error.
A complete fix will have to update _many_ other functions throughout the
codebase to use `size_t` instead of `unsigned long`. It will have to be
implemented at some stage.
This commit puts in a stop-gap for the time being.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Now that we have a `batch` mode, let's be explicit.
This is a follow-up to ce4786fc77 (mingw: change core.fsyncObjectFiles
= 1 by default, 2017-09-04) and will most likely have to be squashed
into it before upstreaming that patch (after the `batch` fsync mode was
upstreamed).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
There is mixed use of size_t and unsigned long to deal with sizes in the
codebase. Recall that Windows defines unsigned long as 32 bits even on
64-bit platforms, meaning that converting size_t to unsigned long narrows
the range. This mostly doesn't cause a problem since Git rarely deals
with files larger than 2^32 bytes.
But adjunct systems such as Git LFS, which use smudge/clean filters to
keep huge files out of the repository, may have huge file contents passed
through some of the functions in entry.c and convert.c. On Windows, this
results in a truncated file being written to the workdir. I traced this to
one specific use of unsigned long in write_entry (and a similar instance
in write_pc_item_to_fd for parallel checkout). That appeared to be for
the call to read_blob_entry, which expects a pointer to unsigned long.
By altering the signature of read_blob_entry to expect a size_t,
write_entry can be switched to use size_t internally (which all of its
callers and most of its callees already used). To avoid touching dozens of
additional files, read_blob_entry uses a local unsigned long to call a
chain of functions which aren't prepared to accept size_t.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This merges the topic branch (specifically backported onto v2.33.1 to
allow for integrating into Git for Windows' `main` branch) that strikes
a better balance between safety and speed: rather than `fsync()`ing each
and every loose object file, we now offer to do it in a batch.
This will become the new default in Git for Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The filter system allows for alterations to file contents when they're
added to the database or workdir. ("Smudge" when moving to the workdir;
"clean" when moving to the database.) This is used natively to handle CRLF
to LF conversions. It's also employed by Git-LFS to replace large files
from the workdir with small tracking files in the repo and vice versa.
Git pulls the entire smudged file into memory. While this is inefficient,
there's a more insidious problem on some platforms due to inconsistency
between using unsigned long and size_t for the same type of data (size of
a file in bytes). On most 64-bit platforms, unsigned long is 64 bits, and
size_t is typedef'd to unsigned long. On Windows, however, unsigned long is
only 32 bits (and therefore on 64-bit Windows, size_t is typedef'd to
unsigned long long in order to be 64 bits).
Practically speaking, this means 64-bit Windows users of Git-LFS can't
handle files larger than 2^32 bytes. Other 64-bit platforms don't suffer
this limitation.
This commit introduces a test exposing the issue; future commits make it
pass. The test simulates the way Git-LFS works by having a tiny file
checked into the repository and expanding it to a huge file on checkout.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `xutftowcs_path` function canonicalizes absolute paths using GetFullPathNameW.
This canonicalization may change the length of the string (e.g. getting rid of \.\),
which breaks callers that pass the template string in a strbuf and expect the
length of the string to remain the same.
In my particular case, the tmp-objdir code is passing a strbuf to mkdtemp and is
breaking since the strbuf.len is no longer synchronized with strlen(strbuf.buf).
Signed-off-by: Neeraj K. Singh <neerajsi@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow tests that assume a 64-bit `size_t` to be skipped in 32-bit
platforms and regardless of the size of `long`.
This imitates the `LONG_IS_64BIT` prerequisite.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Add test cases to exercise batch mode for:
* 'git add'
* 'git stash'
* 'git update-index'
* 'git unpack-objects'
These tests ensure that the added data winds up in the object database.
In this change we introduce a new test helper lib-unique-files.sh. The
goal of this library is to create a tree of files that have different
oids from any other files that may have been created in the current test
repo. This helps us avoid missing validation of an object being added due
to it already being in the repo.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In this developer's tests, producing one gigabyte worth of NULs in a
busy loop that writes out individual bytes, unbuffered, took ~27sec.
Writing chunked 256kB buffers instead only took ~0.6sec
This matters because we are about to introduce a pair of test cases that
want to be able to produce 5GB of NULs, and we cannot use `/dev/zero`
because of the HP NonStop platform's lack of support for that device.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.
By enabling bulk-checkin when unpacking objects, we can take advantage
of batched fsyncs.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
d5cfd142ec (tests: teach the test-tool to generate NUL bytes and
use it, 2019-02-14), add a way to generate zeroes in a portable
way without using /dev/zero (needed by HP NonStop), but uses a
long variable that is limited to 2^31 in Windows.
Use instead a (POSIX/C99) intmax_t that is at least 64bit wide
in 64-bit Windows to use in a future test.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.
This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality.
There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to synchronize with the update-index
process after passing it a file path.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.
The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.
I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.
One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
and macOS offer mechanisms to write data from the filesystem page cache
without initiating a hardware flush. Linux has the sync_file_range API,
which issues a pagecache writeback request reliably after version 5.2.
This patch introduces a new 'core.fsyncObjectFiles = batch' option that
batches up hardware flushes. It hooks into the bulk-checkin plugging and
unplugging functionality and takes advantage of tmp-objdir.
When the new mode is enabled do the following for each new object:
1. Create the object in a tmp-objdir.
2. Issue a pagecache writeback request and wait for it to complete.
At the end of the entire transaction when unplugging bulk checkin:
1. Issue an fsync against a dummy file to flush the hardware writeback
cache, which should by now have processed the tmp-objdir writes.
2. Rename all of the tmp-objdir files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
another fsync internal to that operation. This is not the case today,
but may be a good extension to those components.
On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.
This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.
_Performance numbers_:
Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
This number is from a patch later in the series.
Adding 500 files to the repo with 'git add' Times reported in seconds.
core.fsyncObjectFiles | Linux | Mac | Windows
----------------------|-------|-------|--------
false | 0.06 | 0.35 | 0.61
true | 1.88 | 11.18 | 2.47
batch | 0.15 | 0.41 | 1.53
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.
* Rename 'state' variable to 'bulk_checkin_state', since we will later
be adding 'bulk_fsync_state'. This also makes the variable easier to
find in the debugger, since the name is more unique.
* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
static variable. Doing this avoids resetting the variable in
finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
seem to unintentionally disable the plugging functionality the first
time a new packfile must be created due to packfile size limits. While
disabling the plugging state only results in suboptimal behavior for
the current code, it would be fatal for the bulk-fsync functionality
later in this patch series.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, an absolute POSIX path needs to be turned into a Windows
one. We used to interpret paths starting with a single `/` as relative
to the runtime-prefix, but now these need to be prefixed with
`%(prefix)/`. Let's warn for now, but still handle it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Update wchar_t buffers to use MAX_LONG_PATH instead of MAX_PATH in
the Win32 backend source files.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
When creating a subprocess with a temporary ODB, we set the
GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not
to update refs, since the tmp-objdir may go away.
Introduce a similar mechanism for in-process temporary ODBs when
we call tmp_objdir_replace_primary_odb. Now both mechanisms set
the disable_ref_updates flag on the odb, which is queried by
the ref_transaction_prepare function.
Note: This change adds an assumption that the state of
the_repository is relevant for any ref transaction that might
be initiated. Unwinding this assumption should be straightforward
by saving the relevant repository to query in the transaction or
the ref_store.
Peff's test case was invoking ref updates via the cachetextconv
setting. That particular code silently does nothing when a ref
update is forbidden. See the call to notes_cache_put in
fill_textconv where errors are ignored.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tmp_objdir API provides the ability to create temporary object
directories, but was designed with the goal of having subprocesses
access these object stores, followed by the main process migrating
objects from it to the main object store or just deleting it. The
subprocesses would view it as their primary datastore and write to it.
Here we add the tmp_objdir_replace_primary_odb function that replaces
the current process's writable "main" object directory with the
specified one. The previous main object directory is restored in either
tmp_objdir_migrate or tmp_objdir_destroy.
For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
object_database` to mark ephemeral object databases that do not require
fsync durability.
Add 'git prune' support for removing temporary object databases, and
make sure that they have a name starting with tmp_ and containing an
operation-specific name.
Based-on-patch-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This topic branch re-adds the deprecated --stdin/-z options to `git
reset`. Those patches were overridden by a different set of options in
the upstream Git project before we could propose `--stdin`.
We offered this in MinGit to applications that wanted a safer way to
pass lots of pathspecs to Git, and these applications will need to be
adjusted.
Instead of `--stdin`, `--pathspec-from-file=-` should be used, and
instead of `-z`, `--pathspec-file-nul`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This branch allows third-party tools to call `git status
--no-lock-index` to avoid lock contention with the interactive Git usage
of the actual human user.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>