Commit Graph

660 Commits

Author SHA1 Message Date
Johannes Schindelin
4ef3298ce5 Introduce helper to create symlinks that knows about index_state
On Windows, symbolic links actually have a type depending on the target:
it can be a file or a directory.

In certain circumstances, this poses problems, e.g. when a symbolic link
is supposed to point into a submodule that is not checked out, so there
is no way for Git to auto-detect the type.

To help with that, we will add support over the course of the next
commits to specify that symlink type via the Git attributes. This
requires an index_state, though, something that Git for Windows'
`symlink()` replacement cannot know about because the function signature
is defined by the POSIX standard and not ours to change.

So let's introduce a helper function to create symbolic links that
*does* know about the index_state.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-12-12 15:25:27 +01:00
Ben Peart
b9b9d50585 fscache: update fscache to be thread specific instead of global
The threading model for fscache has been to have a single, global cache.
This puts requirements on it to be thread safe so that callers like
preload-index can call it from multiple threads.  This was implemented
with a single mutex and completion events which introduces contention
between the calling threads.

Simplify the threading model by making fscache thread specific.  This allows
us to remove the global mutex and synchronization events entirely and instead
associate a fscache with every thread that requests one. This works well with
the current multi-threading which divides the cache entries into blocks with
a separate thread processing each block.

At the end of each worker thread, if there is a fscache on the primary
thread, merge the cached results from the worker into the primary thread
cache. This enables us to reuse the cache later especially when scanning for
untracked files.

In testing, this reduced the time spent in preload_index() by about 25% and
also reduced the CPU utilization significantly.  On a repo with ~200K files,
it reduced overall status times by ~12%.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2022-12-12 15:25:22 +01:00
Ben Peart
65cab33aff fscache: fscache takes an initial size
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.

Add a separate disable_fscache() macro to make the code clearer and easier
to read.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-12-12 15:25:22 +01:00
Takuto Ikuta
77434e733a checkout.c: enable fscache for checkout again
This is retry of #1419.

I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.

git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.

Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333

Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136

I confirmed this patch passed all tests in t/ with core_fscache=1.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2022-12-12 15:25:22 +01:00
Jeff Hostetler
dfdb1e3708 fscache: make fscache_enabled() public
Make fscache_enabled() function public rather than static.
Remove unneeded fscache_is_enabled() function.
Change is_fscache_enabled() macro to call fscache_enabled().

is_fscache_enabled() now takes a pathname so that the answer
is more precise and mean "is fscache enabled for this pathname",
since fscache only stores repo-relative paths and not absolute
paths, we can avoid attempting lookups for absolute paths.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2022-12-12 15:25:22 +01:00
Jeff Hostetler
ae56627775 dir.c: make add_excludes aware of fscache during status
Teach read_directory_recursive() and add_excludes() to
be aware of optional fscache and avoid trying to open()
and fstat() non-existant ".gitignore" files in every
directory in the worktree.

The current code in add_excludes() calls open() and then
fstat() for a ".gitignore" file in each directory present
in the worktree.  Change that when fscache is enabled to
call lstat() first and if present, call open().

This seems backwards because both lstat needs to do more
work than fstat.  But when fscache is enabled, fscache will
already know if the .gitignore file exists and can completely
avoid the IO calls.  This works because of the lstat diversion
to mingw_lstat when fscache is enabled.

This reduced status times on a 350K file enlistment of the
Windows repo on a NVMe SSD by 0.25 seconds.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2022-12-12 15:25:22 +01:00
Karsten Blees
d258fd0ecc mingw: add a cache below mingw's lstat and dirent implementations
Checking the work tree status is quite slow on Windows, due to slow
`lstat()` emulation (git calls `lstat()` once for each file in the
index). Windows operating system APIs seem to be much better at scanning
the status of entire directories than checking single files.

Add an `lstat()` implementation that uses a cache for lstat data. Cache
misses read the entire parent directory and add it to the cache.
Subsequent `lstat()` calls for the same directory are served directly
from the cache.

Also implement `opendir()`/`readdir()`/`closedir()` so that they create
and use directory listings in the cache.

The cache doesn't track file system changes and doesn't plug into any
modifying file APIs, so it has to be explicitly enabled for git functions
that don't modify the working copy.

Note: in an earlier version of this patch, the cache was always active and
tracked file system changes via ReadDirectoryChangesW. However, this was
much more complex and had negative impact on the performance of modifying
git commands such as 'git checkout'.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-12-12 15:25:22 +01:00
Karsten Blees
9530b397f9 add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2022-12-12 15:25:22 +01:00
Johannes Schindelin
8f9a61f639 Merge pull request #4013 from dscho/mimalloc
Switch Git for Windows to using mimalloc instead of nedmalloc
2022-12-12 15:25:21 +01:00
Johannes Schindelin
51fa14702f Merge pull request #2504 from dscho/access-repo-via-junction
Handle `git add <file>` where <file> traverses an NTFS junction
2022-12-12 15:25:18 +01:00
Johannes Schindelin
534cebd7e7 mimalloc: offer a build-time option to enable it
By defining `USE_MIMALLOC`, Git can now be compiled with that
nicely-fast and small allocator.

Note that we have to disable a couple `DEVELOPER` options to build
mimalloc's source code, as it makes heavy use of declarations after
statements, among other things that disagree with Git's conventions.

For example, the `-Wno-array-bounds` flag is needed because in `-O2`
builds, trying to call `NtCurrentTeb()` (which `_mi_thread_id()` does on
Windows) causes the bogus warning about a system header, likely related
to https://sourceforge.net/p/mingw-w64/mailman/message/37674519/ and to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578:

C:/git-sdk-64-minimal/mingw64/include/psdk_inc/intrin-impl.h:838:1:
        error: array subscript 0 is outside array bounds of 'long long unsigned int[0]' [-Werror=array-bounds]
  838 | __buildreadseg(__readgsqword, unsigned __int64, "gs", "q")
      | ^~~~~~~~~~~~~~

Also: The `mimalloc` library uses C11-style atomics, therefore we must
require that standard when compiling with GCC if we want to use
`mimalloc` (instead of requiring "only" C99). This is what we do in the
CMake definition already, therefore this commit does not need to touch
`contrib/buildsystems/`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-12-12 15:25:16 +01:00
Johannes Schindelin
394f50e78b git-compat-util: avoid redeclaring _DEFAULT_SOURCE
We are about to vendor in `mimalloc`'s source code which we will want to
include `git-compat-util.h` after defining that constant.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-12-12 15:25:16 +01:00
Johannes Schindelin
5a93880d74 strbuf_realpath(): use platform-dependent API if available
Some platforms (e.g. Windows) provide API functions to resolve paths
much quicker. Let's offer a way to short-cut `strbuf_realpath()` on
those platforms.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-12-12 15:25:12 +01:00
Johannes Schindelin
d2fd8bb249 clean: do not traverse mount points
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.

In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".

Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).

This addresses https://github.com/git-for-windows/git/issues/607

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-12-12 15:25:11 +01:00
Junio C Hamano
220604042c Merge branch 'jk/unused-anno-more'
More UNUSED annotation to help using -Wunused option with the
compiler.

* jk/unused-anno-more:
  ll-merge: mark unused parameters in callbacks
  diffcore-pickaxe: mark unused parameters in pickaxe functions
  convert: mark unused parameter in null stream filter
  apply: mark unused parameters in noop error/warning routine
  apply: mark unused parameters in handlers
  date: mark unused parameters in handler functions
  string-list: mark unused callback parameters
  object-file: mark unused parameters in hash_unknown functions
  mark unused parameters in trivial compat functions
  update-index: drop unused argc from do_reupdate()
  submodule--helper: drop unused argc from module_list_compute()
  diffstat_consume(): assert non-zero length
2022-10-27 14:51:52 -07:00
Jeff King
808e91956d mark unused parameters in trivial compat functions
When a platform feature isn't available or in use, we sometimes
conditionally compile empty or trivial functions to turn these into
noops. We need to annotate their parameters so that -Wunused-parameters
won't complain about them.

Note that there are many more of these in compat/mingw.h, but we'll
leave them for now, as there's some trickery required to get the UNUSED
macro available there.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17 21:24:03 -07:00
Junio C Hamano
44ec91ba4f Merge branch 'ab/unused-annotation'
Compilation fix for ancient compilers.

* ab/unused-annotation:
  git-compat-util.h: GCC deprecated message arg only in GCC 4.5+
2022-10-17 14:56:34 -07:00
Junio C Hamano
410a0e520d Merge branch 'ds/use-platform-regex-on-macos'
With a bit of header twiddling, use the native regexp library on
macOS instead of the compat/ one.

* ds/use-platform-regex-on-macos:
  grep: fix multibyte regex handling under macOS
2022-10-07 17:19:59 -07:00
Alejandro R. Sedeño
7c07f36ad2 git-compat-util.h: GCC deprecated message arg only in GCC 4.5+
https://gcc.gnu.org/gcc-4.5/changes.html says

  The deprecated attribute now takes an optional string argument, for
  example, __attribute__((deprecated("text string"))), that will be
  printed together with the deprecation warning.

While GCC 4.5 is already 12 years old, git checks for even older
versions in places. Let's not needlessly break older compilers when
a small and simple fix is readily available.

Signed-off-by: Alejandro R. Sedeño <asedeno@mit.edu>
Signed-off-by: Alejandro R Sedeño <asedeno@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-05 19:09:59 -07:00
Junio C Hamano
dd407f1c7c Merge branch 'ab/unused-annotation'
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.

* ab/unused-annotation:
  git-compat-util.h: use "deprecated" for UNUSED variables
  git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14 12:56:39 -07:00
Junio C Hamano
a6b42ec0c6 Merge branch 'jk/unused-annotation'
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.

* jk/unused-annotation:
  is_path_owned_by_current_uid(): mark "report" parameter as unused
  run-command: mark unused async callback parameters
  mark unused read_tree_recursive() callback parameters
  hashmap: mark unused callback parameters
  config: mark unused callback parameters
  streaming: mark unused virtual method parameters
  transport: mark bundle transport_options as unused
  refs: mark unused virtual method parameters
  refs: mark unused reflog callback parameters
  refs: mark unused each_ref_fn parameters
  git-compat-util: add UNUSED macro
2022-09-14 12:56:39 -07:00
Junio C Hamano
f322e9f51b Merge branch 'ab/submodule-helper-prep'
Code clean-up of "git submodule--helper".

* ab/submodule-helper-prep: (33 commits)
  submodule--helper: fix bad config API usage
  submodule--helper: libify even more "die" paths for module_update()
  submodule--helper: libify more "die" paths for module_update()
  submodule--helper: check repo{_submodule,}_init() return values
  submodule--helper: libify "must_die_on_failure" code paths (for die)
  submodule--helper update: don't override 'checkout' exit code
  submodule--helper: libify "must_die_on_failure" code paths
  submodule--helper: libify determine_submodule_update_strategy()
  submodule--helper: don't exit() on failure, return
  submodule--helper: use "code" in run_update_command()
  submodule API: don't handle SM_..{UNSPECIFIED,COMMAND} in to_string()
  submodule--helper: don't call submodule_strategy_to_string() in BUG()
  submodule--helper: add missing braces to "else" arm
  submodule--helper: return "ret", not "1" from update_submodule()
  submodule--helper: rename "int res" to "int ret"
  submodule--helper: don't redundantly check "else if (res)"
  submodule--helper: refactor "errmsg_str" to be a "struct strbuf"
  submodule--helper: add "const" to passed "struct update_data"
  submodule--helper: add "const" to copy of "update_data"
  submodule--helper: add "const" to passed "module_clone_data"
  ...
2022-09-13 11:38:23 -07:00
Ævar Arnfjörð Bjarmason
1e8697b5c4 submodule--helper: check repo{_submodule,}_init() return values
Fix code added in ce125d431a (submodule: extract path to submodule
gitdir func, 2021-09-15) and a77c3fcb5e (submodule--helper: get
remote names from any repository, 2022-03-04) which failed to check
the return values of repo_init() and repo_submodule_init(). If we
failed to initialize the repository or submodule we could segfault
when trying to access the invalid repository structs.

Let's also check that these were the only such logic errors in the
codebase by making use of the "warn_unused_result" attribute. This is
valid as of GCC 3.4.0 (and clang will catch it via its faking of
__GNUC__ ).

As the comment being added to git-compat-util.h we're piggy-backing on
the LAST_ARG_MUST_BE_NULL version check out of lazyness. See
9fe3edc47f (Add the LAST_ARG_MUST_BE_NULL macro, 2013-07-18) for its
addition. The marginal benefit of covering gcc 3.4.0..4.0.0 is
near-zero (or zero) at this point. It mostly matters that we catch
this somewhere.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-02 09:16:24 -07:00
Ævar Arnfjörð Bjarmason
9ff7eb8c88 git-compat-util.h: use "deprecated" for UNUSED variables
As noted in the preceding commit our "UNUSED" macro was no longer
protecting against actual use of the "unused" variables, which it was
previously doing by renaming the variable.

Let's instead use the "deprecated" attribute to accomplish that
goal. As [1] rightly notes this has the drawback that compiling with
"-Wno-deprecated-declarations" will silence any such uses. I think the
trade-off is worth it as:

 * We can consider that a feature, as e.g. backporting certain patches
   might use a now "unused" parameter, and the person doing that might
   want to silence it with DEVOPTS=no-error.

 * This way we play nicely with coccinelle, and any other dumb(er)
   parser of C (such as syntax highlighters).

 * Not every single compilation of git needs to catch "used but
   declared unused" parameters. It's sufficient that the default "make
   DEVELOPER=1" will do so, and that the "static-analysis" CI job will
   catch it.

1. https://lore.kernel.org/git/YwCtkwjWdJVHHZV0@coredump.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:49 -07:00
Ævar Arnfjörð Bjarmason
5cf88fd8b0 git-compat-util.h: use "UNUSED", not "UNUSED(var)"
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:48 -07:00
Diomidis Spinellis
1819ad327b grep: fix multibyte regex handling under macOS
The commit 29de20504e (Makefile: fix default regex settings on
Darwin, 2013-05-11) fixed t0070-fundamental.sh under Darwin (macOS) by
adopting Git's regex library.  However, this library is compiled with
NO_MBSUPPORT, which causes git-grep to work incorrectly on multibyte
(e.g. UTF-8) files.  Current macOS versions pass t0070-fundamental.sh
with the native macOS regex library, which also supports multibyte
characters.

Adjust the Makefile to use the native regex library, and call
setlocale(3) to set CTYPE according to the user's preference.
The setlocale call is required on all platforms, but in platforms
supporting gettext(3), setlocale was called as a side-effect of
initializing gettext.  Therefore, move the CTYPE setlocale call from
gettext.c to common-main.c and the corresponding locale.h include
into git-compat-util.h.

Thanks to the global initialization of CTYPE setlocale, the test-tool
regex command now works correctly with supported multibyte regexes, and
is used to set the MB_REGEX test prerequisite by assessing a platform's
support for them.

Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26 11:45:52 -07:00
Junio C Hamano
f00ddc9f48 Merge branch 'vd/scalar-generalize-diagnose'
The "diagnose" feature to create a zip archive for diagnostic
material has been lifted from "scalar" and made into a feature of
"git bugreport".

* vd/scalar-generalize-diagnose:
  scalar: update technical doc roadmap
  scalar-diagnose: use 'git diagnose --mode=all'
  builtin/bugreport.c: create '--diagnose' option
  builtin/diagnose.c: add '--mode' option
  builtin/diagnose.c: create 'git diagnose' builtin
  diagnose.c: add option to configure archive contents
  scalar-diagnose: move functionality to common location
  scalar-diagnose: move 'get_disk_info()' to 'compat/'
  scalar-diagnose: add directory to archiver more gently
  scalar-diagnose: avoid 32-bit overflow of size_t
  scalar-diagnose: use "$GIT_UNZIP" in test
2022-08-25 14:42:32 -07:00
Junio C Hamano
a103ad6f3d Merge branch 'jk/pipe-command-nonblock'
Fix deadlocks between main Git process and subprocess spawned via
the pipe_command() API, that can kill "git add -p" that was
reimplemented in C recently.

* jk/pipe-command-nonblock:
  pipe_command(): mark stdin descriptor as non-blocking
  pipe_command(): handle ENOSPC when writing to a pipe
  pipe_command(): avoid xwrite() for writing to pipe
  git-compat-util: make MAX_IO_SIZE define globally available
  nonblock: support Windows
  compat: add function to enable nonblocking pipes
2022-08-25 14:42:32 -07:00
Jeff King
776515ef8b is_path_owned_by_current_uid(): mark "report" parameter as unused
In the non-Windows version of this function, we never have any errors to
report, and thus the "report" parameter is unused. But we can't drop it,
because we have to maintain function call compatibility with the version
in compat/mingw.h, which does use this parameter.

Note that there's an extra level of indirection here; the common
function is actually is_path_owned_by_current_user, which is a macro
pointing to "by_current_uid" or "by_current_sid", depending on the
platform. So an alternative here is to eat the unused parameter in the
macro, since -Wunused-parameter doesn't complain about macros. But I
think the UNUSED() annotation is less obfuscated for somebody reading
the code later.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:56 -07:00
Jeff King
783a86c142 config: mark unused callback parameters
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.

Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
Jeff King
9b24034754 git-compat-util: add UNUSED macro
In preparation for compiling with -Wunused-parameter, we'd like to be
able to annotate some function parameters as false positives (e.g.,
parameters which must exist to conform to a callback interface).

Ideally our annotation will:

  - be portable, turning into nothing on platforms which don't support
    it

  - be easy to read, without looking too syntactically odd or taking
    attention away from the rest of the parameters

  - help us notice when a parameter marked as unused is actually used,
    which keeps our annotations accurate. In theory a compiler could
    tell us this easily, but gcc has no such warning. Clang has
    -Wused-but-marked-unused, but it triggers false positives with our
    MAYBE_UNUSED annotation (e.g., for commit-slab functions)

This patch introduces an UNUSED() macro which takes the parameter name
as an argument. That lets us tweak the name in such a way that we'll
notice if somebody tries to use it. It looks like this in use:

  int some_ref_cb(const char *refname,
                  const struct object_id *UNUSED(oid),
                  int UNUSED(flags),
                  void *UNUSED(data))
  {
        printf("got refname %s", refname);
        return 0;
  }

Because the unused parameter names are rewritten behind the scenes to
UNUSED_oid, etc, adding code like:

  printf("oid is %s", oid_to_hex(oid));

will fail compilation with "oid undeclared". Sadly, the "did you mean"
feature of modern compilers is not generally smart enough to suggest the
"unused" name. If we used a very short prefix like U_oid, that does
convince gcc to say "did you mean", but since the "U_" in the suggestion
isn't much of a hint, it doesn't really help. In practice, a look at the
function definition usually makes the problem pretty obvious.

Note that we have to put the definition of UNUSED early in
git-compat-util.h, because it will eventually be used for some compat
functions themselves (both directly here and in mingw.h).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:54 -07:00
Jeff King
ec4f39b233 git-compat-util: make MAX_IO_SIZE define globally available
We define MAX_IO_SIZE within wrapper.c, but it's useful for any code
that wants to do a raw write() for whatever reason (say, because they
want different EAGAIN handling). Let's make it available everywhere.

The alternative would be adding xwrite_foo() variants to give callers
more options. But there's really no reason MAX_IO_SIZE needs to be
abstracted away, so this give callers the most flexibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-17 09:21:40 -07:00
Victoria Dye
435a2535b7 scalar-diagnose: move 'get_disk_info()' to 'compat/'
Move 'get_disk_info()' function into 'compat/'. Although Scalar-specific
code is generally not part of the main Git tree, 'get_disk_info()' will be
used in subsequent patches by additional callers beyond 'scalar diagnose'.
This patch prepares for that change, at which point this platform-specific
code should be part of 'compat/' as a matter of convention.

The function is copied *mostly* verbatim, with two exceptions:

* '#ifdef WIN32' is replaced with '#ifdef GIT_WINDOWS_NATIVE' to allow
  'statvfs' to be used with Cygwin.
* the 'struct strbuf buf' and 'int res' (as well as their corresponding
  cleanup & return) are moved outside of the '#ifdef' block.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12 13:20:02 -07:00
Johannes Schindelin
17d3883fe9 setup: prepare for more detailed "dubious ownership" messages
When verifying the ownership of the Git directory, we sometimes would
like to say a bit more about it, e.g. when using a platform-dependent
code path (think: Windows has the permission model that is so different
from Unix'), but only when it is a appropriate to actually say
something.

To allow for that, collect that information and hand it back to the
caller (whose responsibility it is to show it or not).

Note: We do not actually fill in any platform-dependent information yet,
this commit just adds the infrastructure to be able to do so.

Based-on-an-idea-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08 09:25:40 -07:00
Junio C Hamano
694c0cc0fb Merge branch 'cb/path-owner-check-with-sudo-plus'
"sudo git foo" used to consider a repository owned by the original
user a safe one to access; it now also considers a repository owned
by root a safe one, too (after all, if an attacker can craft a
malicious repository owned by root, the box is 0wned already).

* cb/path-owner-check-with-sudo-plus:
  git-compat-util: allow root to access both SUDO_UID and root owned
2022-06-17 17:12:31 -07:00
Carlo Marcelo Arenas Belón
6b11e3d52e git-compat-util: allow root to access both SUDO_UID and root owned
Previous changes introduced a regression which will prevent root for
accessing repositories owned by thyself if using sudo because SUDO_UID
takes precedence.

Loosen that restriction by allowing root to access repositories owned
by both uid by default and without having to add a safe.directory
exception.

A previous workaround that was documented in the tests is no longer
needed so it has been removed together with its specially crafted
prerequisite.

Helped-by: Johanness Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-17 14:03:08 -07:00
Junio C Hamano
4da14b574f Merge branch 'ab/bug-if-bug'
A new bug() and BUG_if_bug() API is introduced to make it easier to
uniformly log "detect multiple bugs and abort in the end" pattern.

* ab/bug-if-bug:
  cache-tree.c: use bug() and BUG_if_bug()
  receive-pack: use bug() and BUG_if_bug()
  parse-options.c: use optbug() instead of BUG() "opts" check
  parse-options.c: use new bug() API for optbug()
  usage.c: add a non-fatal bug() function to go with BUG()
  common-main.c: move non-trace2 exit() behavior out of trace2.c
2022-06-10 15:04:15 -07:00
Junio C Hamano
b3b2ddced2 Merge branch 'ds/bundle-uri'
Preliminary code refactoring around transport and bundle code.

* ds/bundle-uri:
  bundle.h: make "fd" version of read_bundle_header() public
  remote: allow relative_url() to return an absolute url
  remote: move relative_url()
  http: make http_get_file() external
  fetch-pack: move --keep=* option filling to a function
  fetch-pack: add a deref_without_lazy_fetch_extended()
  dir API: add a generalized path_match_flags() function
  connect.c: refactor sending of agent & object-format
2022-06-03 14:30:34 -07:00
Junio C Hamano
83937e9592 Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-06-03 14:30:34 -07:00
Ævar Arnfjörð Bjarmason
0cc05b044f usage.c: add a non-fatal bug() function to go with BUG()
Add a bug() function to use in cases where we'd like to indicate a
runtime BUG(), but would like to defer the BUG() call because we're
possibly accumulating more bug() callers to exhaustively indicate what
went wrong.

We already have this sort of facility in various parts of the
codebase, just in the form of ad-hoc re-inventions of the
functionality that this new API provides. E.g. this will be used to
replace optbug() in parse-options.c, and the 'error("BUG:[...]' we do
in a loop in builtin/receive-pack.c.

Unlike the code this replaces we'll log to trace2 with this new bug()
function (as with other usage.c functions, including BUG()), we'll
also be able to avoid calls to xstrfmt() in some cases, as the bug()
function itself accepts variadic sprintf()-like arguments.

Any caller to bug() can follow up such calls with BUG_if_bug(),
which will BUG() out (i.e. abort()) if there were any preceding calls
to bug(), callers can also decide not to call BUG_if_bug() and leave
the resulting BUG() invocation until exit() time. There are currently
no bug() API users that don't call BUG_if_bug() themselves after a
for-loop, but allowing for not calling BUG_if_bug() keeps the API
flexible. As the tests and documentation here show we'll catch missing
BUG_if_bug() invocations in our exit() wrapper.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Ævar Arnfjörð Bjarmason
19d75948ef common-main.c: move non-trace2 exit() behavior out of trace2.c
Change the exit() wrapper added in ee4512ed48 (trace2: create new
combined trace facility, 2019-02-22) so that we'll split up the trace2
logging concerns from wanting to wrap the "exit()" function itself for
other purposes.

This makes more sense structurally, as we won't seem to conflate
non-trace2 behavior with the trace2 code. I'd previously added an
explanation for this in 368b584315 (common-main.c: call exit(), don't
return, 2021-12-07), that comment is being adjusted here.

Now the only thing we'll do if we're not using trace2 is to truncate
the "code" argument to the lowest 8 bits.

We only need to do that truncation on non-POSIX systems, but in
ee4512ed48 that "if defined(__MINGW32__)" code added in
47e3de0e79 (MinGW: truncate exit()'s argument to lowest 8 bits,
2009-07-05) was made to run everywhere. It might be good for clarify
to narrow that down by an "ifdef" again, but I'm not certain that in
the interim we haven't had some other non-POSIX systems rely the
behavior. On a POSIX system taking the lowest 8 bits is implicit, see
exit(3)[1] and wait(2)[2]. Let's leave a comment about that instead.

1. https://man7.org/linux/man-pages/man3/exit.3.html
2. https://man7.org/linux/man-pages/man2/wait.2.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:30 -07:00
Junio C Hamano
2088a0c0cd Merge branch 'cb/path-owner-check-with-sudo'
With a recent update to refuse access to repositories of other
people by default, "sudo make install" and "sudo git describe"
stopped working.  This series intends to loosen it while keeping
the safety.

* cb/path-owner-check-with-sudo:
  t0034: add negative tests and allow git init to mostly work under sudo
  git-compat-util: avoid failing dir ownership checks if running privileged
  t: regression git needs safe.directory when using sudo
2022-05-26 14:51:32 -07:00
Ævar Arnfjörð Bjarmason
9fd512c8d6 dir API: add a generalized path_match_flags() function
Add a path_match_flags() function and have the two sets of
starts_with_dot_{,dot_}slash() functions added in
63e95beb08 (submodule: port resolve_relative_url from shell to C,
2016-04-15) and a2b26ffb1a (fsck: convert gitmodules url to URL
passed to curl, 2020-04-18) be thin wrappers for it.

As the latter of those notes the fsck version was copied from the
initial builtin/submodule--helper.c version.

Since the code added in a2b26ffb1a was doing really doing the same as
win32_is_dir_sep() added in 1cadad6f65 (git clone <url>
C:\cygwin\home\USER\repo' is working (again), 2018-12-15) let's move
the latter to git-compat-util.h is a is_xplatform_dir_sep(). We can
then call either it or the platform-specific is_dir_sep() from this
new function.

Let's likewise change code in various other places that was hardcoding
checks for "'/' || '\\'" with the new is_xplatform_dir_sep(). As can
be seen in those callers some of them still concern themselves with
':' (Mac OS classic?), but let's leave the question of whether that
should be consolidated for some other time.

As we expect to make wider use of the "native" case in the future,
define and use two starts_with_dot_{,dot_}slash_native() convenience
wrappers. This makes the diff in builtin/submodule--helper.c much
smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16 15:02:09 -07:00
Carlo Marcelo Arenas Belón
ae9abbb63e git-compat-util: avoid failing dir ownership checks if running privileged
bdc77d1d68 (Add a function to determine whether a path is owned by the
current user, 2022-03-02) checks for the effective uid of the running
process using geteuid() but didn't account for cases where that user was
root (because git was invoked through sudo or a compatible tool) and the
original uid that repository trusted for its config was no longer known,
therefore failing the following otherwise safe call:

  guy@renard ~/Software/uncrustify $ sudo git describe --always --dirty
  [sudo] password for guy:
  fatal: unsafe repository ('/home/guy/Software/uncrustify' is owned by someone else)

Attempt to detect those cases by using the environment variables that
those tools create to keep track of the original user id, and do the
ownership check using that instead.

This assumes the environment the user is running on after going
privileged can't be tampered with, and also adds code to restrict that
the new behavior only applies if running as root, therefore keeping the
most common case, which runs unprivileged, from changing, but because of
that, it will miss cases where sudo (or an equivalent) was used to change
to another unprivileged user or where the equivalent tool used to raise
privileges didn't track the original id in a sudo compatible way.

Because of compatibility with sudo, the code assumes that uid_t is an
unsigned integer type (which is not required by the standard) but is used
that way in their codebase to generate SUDO_UID.  In systems where uid_t
is signed, sudo might be also patched to NOT be unsigned and that might
be able to trigger an edge case and a bug (as described in the code), but
it is considered unlikely to happen and even if it does, the code would
just mostly fail safely, so there was no attempt either to detect it or
prevent it by the code, which is something that might change in the future,
based on expected user feedback.

Reported-by: Guy Maurel <guy.j@maurel.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Randall Becker <rsbecker@nexbridge.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-12 18:12:23 -07:00
Junio C Hamano
f1b50ec6f8 Merge tag 'v2.35.2' 2022-04-11 16:44:45 -07:00
Junio C Hamano
95acb13a55 Merge branch 'bc/csprng-mktemps'
Build fix.

* bc/csprng-mktemps:
  git-compat-util: really support openssl as a source of entropy
2022-04-06 15:21:59 -07:00
Neeraj Singh
8a94d83349 core.fsync: use batch mode and sync loose objects by default on Windows
Git for Windows has defaulted to core.fsyncObjectFiles=true since
September 2017. We turn on syncing of loose object files with batch mode
in upstream Git so that we can get broad coverage of the new code
upstream.

We don't actually do fsyncs in the most of the test suite, since
GIT_TEST_FSYNC is set to 0. However, we do exercise all of the
surrounding batch mode code since GIT_TEST_FSYNC merely makes the
maybe_fsync wrapper always appear to succeed.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06 13:13:26 -07:00
Junio C Hamano
fca85986bb Merge branch 'ns/core-fsyncmethod' into ns/batch-fsync
* ns/core-fsyncmethod:
  configure.ac: fix HAVE_SYNC_FILE_RANGE definition
  core.fsyncmethod: correctly camel-case warning message
  core.fsync: fix incorrect expression for default configuration
  core.fsync: documentation and user-friendly aggregate options
  core.fsync: new option to harden the index
  core.fsync: add configuration parsing
  core.fsync: introduce granular fsync control infrastructure
  core.fsyncmethod: add writeout-only mode
  wrapper: make inclusion of Windows csprng header tightly scoped
2022-04-06 13:01:54 -07:00
Carlo Marcelo Arenas Belón
5b52d9f15e git-compat-util: really support openssl as a source of entropy
05cd988dce (wrapper: add a helper to generate numbers from a CSPRNG,
2022-01-17), configure openssl as the source for entropy in NON-STOP
but doesn't add the needed header or link options.

Since the only system that is configured to use openssl as a source
of entropy is NON-STOP, add the header unconditionally, and -lcrypto
to the list of external libraries.

An additional change is required to make sure a NO_OPENSSL=1 build
will be able to work as well (tested on Linux with a modified value
of CSPRNG_METHOD = openssl), and the more complex logic that allows
for compatibility with APPLE_COMMON_CRYPTO or allowing for simpler
ways to link (without libssl) has been punted for now.

Reported-by: Randall Becker <rsbecker@nexbridge.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06 09:04:50 -07:00
Neeraj Singh
9a4987677d trace2: add stats for fsync operations
Add some global trace2 statistics for the number of fsyncs performed
during the lifetime of a Git process.

These stats are printed as part of trace2_cmd_exit_fl, which is
presumably where we might want to print any other cross-cutting
statistics.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-30 11:15:55 -07:00