When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.
One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
and macOS offer mechanisms to write data from the filesystem page cache
without initiating a hardware flush. Linux has the sync_file_range API,
which issues a pagecache writeback request reliably after version 5.2.
This patch introduces a new 'core.fsyncObjectFiles = batch' option that
batches up hardware flushes. It hooks into the bulk-checkin plugging and
unplugging functionality and takes advantage of tmp-objdir.
When the new mode is enabled do the following for each new object:
1. Create the object in a tmp-objdir.
2. Issue a pagecache writeback request and wait for it to complete.
At the end of the entire transaction when unplugging bulk checkin:
1. Issue an fsync against a dummy file to flush the hardware writeback
cache, which should by now have processed the tmp-objdir writes.
2. Rename all of the tmp-objdir files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
another fsync internal to that operation. This is not the case today,
but may be a good extension to those components.
On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.
This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.
_Performance numbers_:
Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
This number is from a patch later in the series.
Adding 500 files to the repo with 'git add' Times reported in seconds.
core.fsyncObjectFiles | Linux | Mac | Windows
----------------------|-------|-------|--------
false | 0.06 | 0.35 | 0.61
true | 1.88 | 11.18 | 2.47
batch | 0.15 | 0.41 | 1.53
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.
* Rename 'state' variable to 'bulk_checkin_state', since we will later
be adding 'bulk_fsync_state'. This also makes the variable easier to
find in the debugger, since the name is more unique.
* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
static variable. Doing this avoids resetting the variable in
finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
seem to unintentionally disable the plugging functionality the first
time a new packfile must be created due to packfile size limits. While
disabling the plugging state only results in suboptimal behavior for
the current code, it would be fatal for the bulk-fsync functionality
later in this patch series.
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.34:
Git 2.34.2
Git 2.33.2
Git 2.32.1
Git 2.31.2
GIT-VERSION-GEN: bump to v2.33.1
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.33:
Git 2.33.2
Git 2.32.1
Git 2.31.2
GIT-VERSION-GEN: bump to v2.33.1
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.32:
Git 2.32.1
Git 2.31.2
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.31:
Git 2.31.2
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
* maint-2.30:
Git 2.30.3
setup_git_directory(): add an owner check for the top-level directory
Add a function to determine whether a path is owned by the current user
When determining the length of the longest ancestor of a given path with
respect to to e.g. `GIT_CEILING_DIRECTORIES`, we special-case the root
directory by returning 0 (i.e. we pretend that the path `/` does not end
in a slash by virtually stripping it).
That is the correct behavior because when normalizing paths, the root
directory is special: all other directory paths have their trailing
slash stripped, but not the root directory's path (because it would
become the empty string, which is not a legal path).
However, this special-casing of the root directory in
`longest_ancestor_length()` completely forgets about Windows-style root
directories, e.g. `C:\`. These _also_ get normalized with a trailing
slash (because `C:` would actually refer to the current directory on
that drive, not necessarily to its root directory).
In fc56c7b34b (mingw: accomodate t0060-path-utils for MSYS2,
2016-01-27), we almost got it right. We noticed that
`longest_ancestor_length()` expects a slash _after_ the matched prefix,
and if the prefix already ends in a slash, the normalized path won't
ever match and -1 is returned.
But then that commit went astray: The correct fix is not to adjust the
_tests_ to expect an incorrect -1 when that function is fed a prefix
that ends in a slash, but instead to treat such a prefix as if the
trailing slash had been removed.
Likewise, that function needs to handle the case where it is fed a path
that ends in a slash (not only a prefix that ends in a slash): if it
matches the prefix (plus trailing slash), we still need to verify that
the path does not end there, otherwise the prefix is not actually an
ancestor of the path but identical to it (and we need to return -1 in
that case).
With these two adjustments, we no longer need to play games in t0060
where we only add `$rootoff` if the passed prefix is different from the
MSYS2 pseudo root, instead we also add it for the MSYS2 pseudo root
itself. We do have to be careful to skip that logic entirely for Windows
paths, though, because they do are not subject to that MSYS2 pseudo root
treatment.
This patch fixes the scenario where a user has set
`GIT_CEILING_DIRECTORIES=C:\`, which would be ignored otherwise.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It poses a security risk to search for a git directory outside of the
directories owned by the current user.
For example, it is common e.g. in computer pools of educational
institutes to have a "scratch" space: a mounted disk with plenty of
space that is regularly swiped where any authenticated user can create
a directory to do their work. Merely navigating to such a space with a
Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/`
can lead to a compromised account.
The same holds true in multi-user setups running Windows, as `C:\` is
writable to every authenticated user by default.
To plug this vulnerability, we stop Git from accepting top-level
directories owned by someone other than the current user. We avoid
looking at the ownership of each and every directories between the
current and the top-level one (if there are any between) to avoid
introducing a performance bottleneck.
This new default behavior is obviously incompatible with the concept of
shared repositories, where we expect the top-level directory to be owned
by only one of its legitimate users. To re-enable that use case, we add
support for adding exceptions from the new default behavior via the
config setting `safe.directory`.
The `safe.directory` config setting is only respected in the system and
global configs, not from repository configs or via the command-line, and
can have multiple values to allow for multiple shared repositories.
We are particularly careful to provide a helpful message to any user
trying to use a shared repository.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This function will be used in the next commit to prevent
`setup_git_directory()` from discovering a repository in a directory
that is owned by someone other than the current user.
Note: We cannot simply use `st.st_uid` on Windows just like we do on
Linux and other Unix-like platforms: according to
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/stat-functions
this field is always zero on Windows (because Windows' idea of a user ID
does not fit into a single numerical value). Therefore, we have to do
something a little involved to replicate the same functionality there.
Also note: On Windows, a user's home directory is not actually owned by
said user, but by the administrator. For all practical purposes, it is
under the user's control, though, therefore we pretend that it is owned
by the user.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Build fix on Windows.
* cb/mingw-gmtime-r:
mingw: avoid fallback for {local,gm}time_r()
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
mingw-w64's pthread_unistd.h had a bug that mistakenly (because there is
no support for the *lockfile() functions required[1]) defined
_POSIX_THREAD_SAFE_FUNCTIONS and that was being worked around since
3ecd153a3b (compat/mingw: support MSys2-based MinGW build, 2016-01-14).
The bug was fixed in winphtreads, but as a side effect, leaves the
reentrant functions from time.h no longer visible and therefore breaks
the build.
Since the intention all along was to avoid using the fallback functions,
formalize the use of POSIX by setting the corresponding feature flag and
compile out the implementation for the fallback functions.
[1] https://unix.org/whitepapers/reentrant.html
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since df3458e957 (refs API: make parse_loose_ref_contents() not set
errno, 2021-10-16), `files_read_raw_ref()` assumes that `errno` is
always set to a non-zero value upon failure. It even goes so far as to
hard-code that assumption in a `BUG()` when that assumption is not met,
rather than fail gracefully.
According to
https://pubs.opengroup.org/onlinepubs/9699919799/functions/lstat.html,
`lstat()` shall indeed set `errno` upon failure, and
https://pubs.opengroup.org/onlinepubs/9699919799/functions/errno.html
indicates that indeed, the code may rely on `errno` being set to a
non-zero value upon failure.
In `fscache_lstat()`, we did not set `errno` upon a cache miss (which
indicates that the item did not exist at the time the `lstat()` values
were cached), and therefore we now trigger this problem all the time.
Let's set `errno=ENOENT` when no entry was found.
This fixes https://github.com/git-for-windows/git/issues/3674
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Previously, we interpolated paths in config variables that start with a
forward-slash as relative to the runtime prefix. This was not portable
and has been replaced with `%(prefix)/`.
Let's warn users when they use the now-deprecated form.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch re-adds the deprecated --stdin/-z options to `git
reset`. Those patches were overridden by a different set of options in
the upstream Git project before we could propose `--stdin`.
We offered this in MinGit to applications that wanted a safer way to
pass lots of pathspecs to Git, and these applications will need to be
adjusted.
Instead of `--stdin`, `--pathspec-from-file=-` should be used, and
instead of `-z`, `--pathspec-file-nul`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This branch allows third-party tools to call `git status
--no-lock-index` to avoid lock contention with the interactive Git usage
of the actual human user.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
A fix for calling `vim` in Windows Terminal caused a regression and was
reverted. We partially un-revert this, to get the fix again.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These are Git for Windows' Git GUI and gitk patches. We will have to
decide at some point what to do about them, but that's a little lower
priority (as Git GUI seems to be unmaintained for the time being, and
the gitk maintainer keeps a very low profile on the Git mailing list,
too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is the recommended way on GitHub to describe policies revolving around
security issues and about supported versions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Rather than using private IFTTT Applets that send mails to this
maintainer whenever a new version of a Git for Windows component was
released, let's use the power of GitHub workflows to make this process
publicly visible.
This workflow monitors the Atom/RSS feeds, and opens a ticket whenever a
new version was released.
Note: Bash sometimes releases multiple patched versions within a few
minutes of each other (i.e. 5.1p1 through 5.1p4, 5.0p15 and 5.0p16). The
MSYS2 runtime also has a similar system. We can address those patches as
a group, so we shouldn't get multiple issues about them.
Note further: We're not acting on newlib releases, OpenSSL alphas, Perl
release candidates or non-stable Perl releases. There's no need to open
issues about them.
Co-authored-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, an absolute POSIX path needs to be turned into a Windows
one. We used to interpret paths starting with a single `/` as relative
to the runtime-prefix, but now these need to be prefixed with
`%(prefix)/`. Let's warn for now, but still handle it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git documentation refers to $HOME and $XDG_CONFIG_HOME often, but does not specify how or where these values come from on Windows where neither is set by default. The new documentation reflects the behavior of setup_windows_environment() in compat/mingw.c.
Signed-off-by: Alejandro Barreto <alejandro.barreto@ni.com>
Git for Windows accepts pull requests; Core Git does not. Therefore we
need to adjust the template (because it only matches core Git's
project management style, not ours).
Also: direct Git for Windows enhancements to their contributions page,
space out the text for easy reading, and clarify that the mailing list
is plain text, not HTML.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Getting started contributing to Git can be difficult on a Windows
machine. CONTRIBUTING.md contains a guide to getting started, including
detailed steps for setting up build tools, running tests, and
submitting patches to upstream.
[includes an example by Pratik Karki how to submit v2, v3, v4, etc.]
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The Git project followed suite and added their Code of Conduct, based on
the Contributors' Covenant v1.4.
We edit it slightly to reflect Git for Windows' particulars.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `--stdin` option was a well-established paradigm in other commands,
therefore we implemented it in `git reset` for use by Visual Studio.
Unfortunately, upstream Git decided that it is time to introduce
`--pathspec-from-file` instead.
To keep backwards-compatibility for some grace period, we therefore
reinstate the `--stdin` option on top of the `--pathspec-from-file`
option, but mark it firmly as deprecated.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>