Commit Graph

10326 Commits

Author SHA1 Message Date
Johannes Schindelin
3e7f807a9a mingw: ensure that core.longPaths is handled *always*
A ton of Git commands simply do not read (or at least parse) the core.*
settings. This is not good, as Git for Windows relies on the
core.longPaths setting to be read quite early on.

So let's just make sure that all commands read the config and give
platform_core_config() a chance.

This patch teaches tons of Git commands to respect the config setting
`core.longPaths = true`, including `pack-refs`, thereby fixing
https://github.com/git-for-windows/git/issues/1218

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-06-14 17:00:11 +01:00
Johannes Schindelin
5cff5fcf80 clean: make use of FSCache
The `git clean` command needs to enumerate plenty of files and
directories, and can therefore benefit from the FSCache.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-06-14 17:00:03 +01:00
Ben Peart
cf012f7885 fscache: fscache takes an initial size
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.

Add a separate disable_fscache() macro to make the code clearer and easier
to read.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-06-14 17:00:02 +01:00
Ben Peart
5fbe51ce4f status: disable and free fscache at the end of the status command
At the end of the status command, disable and free the fscache so that we
don't leak the memory and so that we can dump the fscache statistics.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
2022-06-14 17:00:02 +01:00
Takuto Ikuta
7c1dbd13b2 checkout.c: enable fscache for checkout again
This is retry of #1419.

I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.

git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.

Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333

Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136

I confirmed this patch passed all tests in t/ with core_fscache=1.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
2022-06-14 17:00:01 +01:00
Jeff Hostetler
3b0ed5f6ae add: use preload-index and fscache for performance
Teach "add" to use preload-index and fscache features
to improve performance on very large repositories.

During an "add", a call is made to run_diff_files()
which calls check_remove() for each index-entry.  This
calls lstat().  On Windows, the fscache code intercepts
the lstat() calls and builds a private cache using the
FindFirst/FindNext routines, which are much faster.

Somewhat independent of this, is the preload-index code
which distributes some of the start-up costs across
multiple threads.

We need to keep the call to read_cache() before parsing the
pathspecs (and hence cannot use the pathspecs to limit any preload)
because parse_pathspec() is using the index to determine whether a
pathspec is, in fact, in a submodule. If we would not read the index
first, parse_pathspec() would not error out on a path that is inside
a submodule, and t7400-submodule-basic.sh would fail with

	not ok 47 - do not add files from a submodule

We still want the nice preload performance boost, though, so we simply
call read_cache_preload(&pathspecs) after parsing the pathspecs.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-06-14 17:00:01 +01:00
Karsten Blees
a1edca6539 add infrastructure for read-only file system level caches
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.

This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.

Enable read-only sections for 'git status' and preload_index.

Signed-off-by: Karsten Blees <blees@dcon.de>
2022-06-14 17:00:00 +01:00
Johannes Schindelin
4f16712de4 Merge pull request #3708 from PhilipOakley/die_preserve
Update the die() preserve-merges messages to help some users
2022-06-14 16:59:58 +01:00
Johannes Schindelin
ed28a3baac Merge pull request #3417 from dscho/initialize-core.symlinks-earlier
init: respect core.symlinks before copying the templates
2022-06-14 16:59:57 +01:00
Johannes Schindelin
067613ee28 Merge pull request #2974 from derrickstolee/maintenance-and-headless
Include Windows-specific maintenance and headless-git
2022-06-14 12:46:15 +01:00
Philip Oakley
29b21fc5c8 rebase: preserve is also a pull option, tell dying users
The `--preserve-merges` option was removed by v2.35.0. However
users may not be aware that it is also a Pull option, and it is
still offered by major IDE vendors such as Visual Studio.

Extend the `--preserve-merges` die message to direct users to
this option and it's locations.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
2022-06-14 12:40:51 +01:00
Philip Oakley
f8f2fc46d7 rebase: help user when dying with preserve-merges`
Git will die if a "rebase --preserve-merges" is in progress.
Users cannot --quit, --abort or --continue the rebase.

This sceario can occur if the user updates their Git, or switches
to another newer version, after starting a preserve-merges rebase,
commonly via the pull setting.

One trigger is an unexpectedly difficult to resolve conflict, as
reported on the `git-users` group.
(https://groups.google.com/g/git-for-windows/c/3jMWbBlXXHM)

Tell the user the cause, i.e. the existence of the directory.
The problem must be resolved manually, `git rebase --<option>`
commands will die, or the user must downgrade. Also, note that
the deleted options are no longer shown in the documentation.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
2022-06-14 12:40:51 +01:00
Johannes Schindelin
a1c57b55cf init: do parse _all_ core.* settings early
In Git for Windows, `has_symlinks` is set to 0 by default. Therefore, we
need to parse the config setting `core.symlinks` to know if it has been
set to `true`. In `git init`, we must do that before copying the
templates because they might contain symbolic links.

Even if the support for symbolic links on Windows has not made it to
upstream Git yet, we really should make sure that all the `core.*`
settings are parsed before proceeding, as they might very well change
the behavior of `git init` in a way the user intended.

This fixes https://github.com/git-for-windows/git/issues/3414

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-06-14 12:00:53 +01:00
Johannes Schindelin
b562a7e3b8 git maintenance: avoid console window in scheduled tasks on Windows
We just introduced a helper to avoid showing a console window when the
scheduled task runs `git.exe`. Let's actually use it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
2022-06-14 12:00:49 +01:00
Johannes Schindelin
7c65717fc2 clean: remove mount points when possible
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-06-14 11:58:15 +01:00
Johannes Schindelin
b1bbf97d94 clean: do not traverse mount points
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.

In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".

Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).

This addresses https://github.com/git-for-windows/git/issues/607

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-06-14 11:58:15 +01:00
Junio C Hamano
2246937e41 Merge branch 'tb/show-ref-optim'
"git show-ref --heads" (and "--tags") still iterated over all the
refs only to discard refs outside the specified area, which has
been corrected.

* tb/show-ref-optim:
  builtin/show-ref.c: avoid over-iterating with --heads, --tags
2022-06-13 15:53:42 -07:00
Junio C Hamano
4da14b574f Merge branch 'ab/bug-if-bug'
A new bug() and BUG_if_bug() API is introduced to make it easier to
uniformly log "detect multiple bugs and abort in the end" pattern.

* ab/bug-if-bug:
  cache-tree.c: use bug() and BUG_if_bug()
  receive-pack: use bug() and BUG_if_bug()
  parse-options.c: use optbug() instead of BUG() "opts" check
  parse-options.c: use new bug() API for optbug()
  usage.c: add a non-fatal bug() function to go with BUG()
  common-main.c: move non-trace2 exit() behavior out of trace2.c
2022-06-10 15:04:15 -07:00
Junio C Hamano
9e496fffc8 Merge branch 'jh/builtin-fsmonitor-part3'
More fsmonitor--daemon.

* jh/builtin-fsmonitor-part3: (30 commits)
  t7527: improve implicit shutdown testing in fsmonitor--daemon
  fsmonitor--daemon: allow --super-prefix argument
  t7527: test Unicode NFC/NFD handling on MacOS
  t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
  t/helper/hexdump: add helper to print hexdump of stdin
  fsmonitor: on macOS also emit NFC spelling for NFD pathname
  t7527: test FSMonitor on case insensitive+preserving file system
  fsmonitor: never set CE_FSMONITOR_VALID on submodules
  t/perf/p7527: add perf test for builtin FSMonitor
  t7527: FSMonitor tests for directory moves
  fsmonitor: optimize processing of directory events
  fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
  fsm-health-win32: force shutdown daemon if worktree root moves
  fsm-health-win32: add polling framework to monitor daemon health
  fsmonitor--daemon: stub in health thread
  fsmonitor--daemon: rename listener thread related variables
  fsmonitor--daemon: prepare for adding health thread
  fsmonitor--daemon: cd out of worktree root
  fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
  unpack-trees: initialize fsmonitor_has_run_once in o->result
  ...
2022-06-10 15:04:15 -07:00
Junio C Hamano
c21fa3bb54 Merge branch 'ab/env-array'
Rename .env_array member to .env in the child_process structure.

* ab/env-array:
  run-command API users: use "env" not "env_array" in comments & names
  run-command API: rename "env_array" to "env"
2022-06-10 15:04:13 -07:00
Junio C Hamano
2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
Taylor Blau
c0c9d35e27 builtin/show-ref.c: avoid over-iterating with --heads, --tags
When `show-ref` is combined with the `--heads` or `--tags` options, it
can avoid iterating parts of a repository's references that it doesn't
care about.

But it doesn't take advantage of this potential optimization. When this
command was introduced back in 358ddb62cf (Add "git show-ref" builtin
command, 2006-09-15), `for_each_ref_in()` did exist. But since most
repositories don't have many (any?) references that aren't branches or
tags already, this makes little difference in practice.

Though for repositories with a large imbalance of branches and tags (or,
more likely in the case of server operators, many hidden references),
this can make quite a difference. Take, for example, a repository with
500,000 "hidden" references (all of the form "refs/__hidden__/N"), and
a single branch:

    git commit --allow-empty -m "base" &&
    seq 1 500000 | sed 's,\(.*\),create refs/__hidden__/\1 HEAD,' |
      git update-ref --stdin &&
    git pack-refs --all

Outputting the existence of that single branch currently takes on the
order of ~50ms on my machine. The vast majority of this time is wasted
iterating through references that we know we're going to discard.

Instead, teach `show-ref` that it can iterate just "refs/heads" and/or
"refs/tags" when given `--heads` and/or `--tags`, respectively. A few
small interesting things to note:

  - When given either option, we can avoid the general-purpose
    for_each_ref() call altogether, since we know that it won't give us
    any references that we wouldn't filter out already.

  - We can make two separate calls to `for_each_fullref_in()` (and
    avoid, say, the more specialized `for_each_fullref_in_prefixes()`,
    since we know that the set of references enumerated by each is
    disjoint, so we'll never see the same reference appear in both
    calls.

  - We have to use the "fullref" variant (instead of just
    `for_each_branch_ref()` and `for_each_tag_ref()`), since we expect
    fully-qualified reference names to appear in `show-ref`'s output.

When either of `heads_only` or `tags_only` is set, we can eliminate the
strcmp() calls in `builtin/show-ref.c::show_ref()` altogether, since we
know that `show_ref()` will never see a non-branch or tag reference.

Unfortunately, we can't use `for_each_fullref_in_prefixes()` to enhance
`show-ref`'s pattern matching, since `show-ref` patterns match on the
_suffix_ (e.g., the pattern "foo" shows "refs/heads/foo",
"refs/tags/foo", and etc, not "foo/*").

Nonetheless, in our synthetic example above, this provides a significant
speed-up ("git" is roughly v2.36, "git.compile" is this patch):

    $ hyperfine -N 'git show-ref --heads' 'git.compile show-ref --heads'
    Benchmark 1: git show-ref --heads
      Time (mean ± σ):      49.9 ms ±   6.2 ms    [User: 45.6 ms, System: 4.1 ms]
      Range (min … max):    46.1 ms …  73.6 ms    43 runs

    Benchmark 2: git.compile show-ref --heads
      Time (mean ± σ):       2.8 ms ±   0.4 ms    [User: 1.4 ms, System: 1.2 ms]
      Range (min … max):     1.3 ms …   5.6 ms    957 runs

    Summary
      'git.compile show-ref --heads' ran
       18.03 ± 3.38 times faster than 'git show-ref --heads'

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-06 09:56:42 -07:00
Junio C Hamano
a50036da1a Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.

* tb/cruft-packs:
  sha1-file.c: don't freshen cruft packs
  builtin/gc.c: conditionally avoid pruning objects via loose
  builtin/repack.c: add cruft packs to MIDX during geometric repack
  builtin/repack.c: use named flags for existing_packs
  builtin/repack.c: allow configuring cruft pack generation
  builtin/repack.c: support generating a cruft pack
  builtin/pack-objects.c: --cruft with expiration
  reachable: report precise timestamps from objects in cruft packs
  reachable: add options to add_unseen_recent_objects_to_traversal
  builtin/pack-objects.c: --cruft without expiration
  builtin/pack-objects.c: return from create_object_entry()
  t/helper: add 'pack-mtimes' test-tool
  pack-mtimes: support writing pack .mtimes files
  chunk-format.h: extract oid_version()
  pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
  pack-mtimes: support reading .mtimes files
  Documentation/technical: add cruft-packs.txt
2022-06-03 14:30:37 -07:00
Junio C Hamano
28db3b7b71 Merge branch 'jx/l10n-workflow-change'
A workflow change for translators are being proposed.

* jx/l10n-workflow-change:
  l10n: Document the new l10n workflow
  Makefile: add "po-init" rule to initialize po/XX.po
  Makefile: add "po-update" rule to update po/XX.po
  po/git.pot: don't check in result of "make pot"
  po/git.pot: this is now a generated file
  Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES
  i18n CI: stop allowing non-ASCII source messages in po/git.pot
  Makefile: have "make pot" not "reset --hard"
  Makefile: generate "po/git.pot" from stable LOCALIZED_C
  Makefile: sort source files before feeding to xgettext
2022-06-03 14:30:36 -07:00
Junio C Hamano
16a0e92ddc Merge branch 'tb/geom-repack-with-keep-and-max'
Teach "git repack --geometric" work better with "--keep-pack" and
avoid corrupting the repository when packsize limit is used.

* tb/geom-repack-with-keep-and-max:
  builtin/repack.c: ensure that `names` is sorted
  t7703: demonstrate object corruption with pack.packSizeLimit
  repack: respect --keep-pack with geometric repack
2022-06-03 14:30:36 -07:00
Junio C Hamano
c276c21da6 Merge branch 'ds/sparse-sparse-checkout'
"sparse-checkout" learns to work well with the sparse-index
feature.

* ds/sparse-sparse-checkout:
  sparse-checkout: integrate with sparse index
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-index: complete partial expansion
  sparse-index: partially expand directories
  sparse-checkout: --no-sparse-index needs a full index
  cache-tree: implement cache_tree_find_path()
  sparse-index: introduce partially-sparse indexes
  sparse-index: create expand_index()
  t1092: stress test 'git sparse-checkout set'
  t1092: refactor 'sparse-index contents' test
2022-06-03 14:30:35 -07:00
Junio C Hamano
091680472d Merge branch 'tb/midx-race-in-pack-objects'
The multi-pack-index code did not protect the packfile it is going
to depend on from getting removed while in use, which has been
corrected.

* tb/midx-race-in-pack-objects:
  builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
  builtin/pack-objects.c: ensure included `--stdin-packs` exist
  builtin/pack-objects.c: avoid redundant NULL check
  pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
2022-06-03 14:30:35 -07:00
Junio C Hamano
b3b2ddced2 Merge branch 'ds/bundle-uri'
Preliminary code refactoring around transport and bundle code.

* ds/bundle-uri:
  bundle.h: make "fd" version of read_bundle_header() public
  remote: allow relative_url() to return an absolute url
  remote: move relative_url()
  http: make http_get_file() external
  fetch-pack: move --keep=* option filling to a function
  fetch-pack: add a deref_without_lazy_fetch_extended()
  dir API: add a generalized path_match_flags() function
  connect.c: refactor sending of agent & object-format
2022-06-03 14:30:34 -07:00
Junio C Hamano
83937e9592 Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-06-03 14:30:34 -07:00
Junio C Hamano
377d347eb3 Merge branch 'en/sparse-cone-becomes-default'
Deprecate non-cone mode of the sparse-checkout feature.

* en/sparse-cone-becomes-default:
  Documentation: some sparsity wording clarifications
  git-sparse-checkout.txt: mark non-cone mode as deprecated
  git-sparse-checkout.txt: flesh out pattern set sections a bit
  git-sparse-checkout.txt: add a new EXAMPLES section
  git-sparse-checkout.txt: shuffle some sections and mark as internal
  git-sparse-checkout.txt: update docs for deprecation of 'init'
  git-sparse-checkout.txt: wording updates for the cone mode default
  sparse-checkout: make --cone the default
  tests: stop assuming --no-cone is the default mode for sparse-checkout
2022-06-03 14:30:33 -07:00
Ævar Arnfjörð Bjarmason
b3193252c4 run-command API users: use "env" not "env_array" in comments & names
Follow-up on a preceding commit which changed all references to the
"env_array" when referring to the "struct child_process" member. These
changes are all unnecessary for the compiler, but help the code's
human readers.

All the comments that referred to "env_array" have now been updated,
as well as function names and variables that had "env_array" in their
name, they now refer to "env".

In addition the "out" name for the submodule.h prototype was
inconsistent with the function definition's use of "env_array" in
submodule.c. Both of them use "env" now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:27 -07:00
Ævar Arnfjörð Bjarmason
29fda24dd1 run-command API: rename "env_array" to "env"
Start following-up on the rename mentioned in c7c4bdeccf (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".

The "env_array" name was picked in 19a583dc39 (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.

This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.

The rest of this is all a result of applying [1]:

 * make contrib/coccinelle/run_command.cocci.patch
 * patch -p1 <contrib/coccinelle/run_command.cocci.patch
 * git add -u

1. cat contrib/coccinelle/run_command.pending.cocci
   @@
   struct child_process E;
   @@
   - E.env_array
   + E.env

   @@
   struct child_process *E;
   @@
   - E->env_array
   + E->env

I've avoided changing any comments and derived variable names here,
that will all be done in the next commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 14:31:16 -07:00
Ævar Arnfjörð Bjarmason
07b1d8f184 receive-pack: use bug() and BUG_if_bug()
Amend code added in a6a8431968 (receive-pack.c: shorten the
execute_commands loop over all commands, 2015-01-07) and amended to
hard die in b6a4788586 (receive-pack.c: die instead of error in case
of possible future bug, 2015-01-07) to use the new bug() function
instead.

Let's also rename the warn_if_*() function that code is in to
BUG_if_*(), its name became outdated in b6a4788586.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-02 12:51:35 -07:00
Junio C Hamano
1fc1879839 Merge branch 'js/use-builtin-add-i'
"git add -i" was rewritten in C some time ago and has been in
testing; the reimplementation is now exposed to general public by
default.

* js/use-builtin-add-i:
  add -i: default to the built-in implementation
  t2016: require the PERL prereq only when necessary
2022-05-30 23:24:03 -07:00
Jeff Hostetler
d06055501b fsmonitor--daemon: stub in health thread
Create another thread to watch over the daemon process and
automatically shut it down if necessary.

This commit creates the basic framework for a "health" thread
to monitor the daemon and/or the file system.  Later commits
will add platform-specific code to do the actual work.

The "health" thread is intended to monitor conditions that
would be difficult to track inside the IPC thread pool and/or
the file system listener threads.  For example, when there are
file system events outside of the watched worktree root or if
we want to have an idle-timeout auto-shutdown feature.

This commit creates the health thread itself, defines the thread-proc
and sets up the thread's event loop.  It integrates this new thread
into the existing IPC and Listener thread models.

This commit defines the API to the platform-specific code where all of
the monitoring will actually happen.

The platform-specific code for MacOS is just stubs.  Meaning that the
health thread will immediately exit on MacOS, but that is OK and
expected.  Future work can define MacOS-specific monitoring.

The platform-specific code for Windows sets up enough of the
WaitForMultipleObjects() machinery to watch for system and/or custom
events.  Currently, the set of wait handles only includes our custom
shutdown event (sent from our other theads).  Later commits in this
series will extend the set of wait handles to monitor other
conditions.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:27 -07:00
Jeff Hostetler
207534e423 fsmonitor--daemon: rename listener thread related variables
Rename platform-specific listener thread related variables
and data types as we prepare to add another backend thread
type.

[] `struct fsmonitor_daemon_backend_data` becomes `struct fsm_listen_data`
[] `state->backend_data` becomes `state->listen_data`
[] `state->error_code` becomes `state->listen_error_code`

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
802aa31840 fsmonitor--daemon: prepare for adding health thread
Refactor daemon thread startup to make it easier to start
a third thread class to monitor the health of the daemon.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
39664e9309 fsmonitor--daemon: cd out of worktree root
Teach the fsmonitor--daemon to CD outside of the worktree
before starting up.

The common Git startup mechanism causes the CWD of the daemon process
to be in the root of the worktree.  On Windows, this causes the daemon
process to hold a locked handle on the CWD and prevents other
processes from moving or deleting the worktree while the daemon is
running.

CD to HOME before entering main event loops.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Jeff Hostetler
62a62a2830 fsmonitor-settings: bare repos are incompatible with FSMonitor
Bare repos do not have a worktree, so there is nothing for the
daemon watch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:59:26 -07:00
Taylor Blau
5b92477f89 builtin/gc.c: conditionally avoid pruning objects via loose
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
ddee3703b3 builtin/repack.c: add cruft packs to MIDX during geometric repack
When using cruft packs, the following race can occur when a geometric
repack that writes a MIDX bitmap takes place afterwords:

  - First, create an unreachable object and do an all-into-one cruft
    repack which stores that object in the repository's cruft pack.
  - Then make that object reachable.
  - Finally, do a geometric repack and write a MIDX bitmap.

Assuming that we are sufficiently unlucky as to select a commit from the
MIDX which reaches that object for bitmapping, then the `git
multi-pack-index` process will complain that that object is missing.

The reason is because we don't include cruft packs in the MIDX when
doing a geometric repack. Since the "make that object reachable" doesn't
necessarily mean that we'll create a new copy of that object in one of
the packs that will get rolled up as part of a geometric repack, it's
possible that the MIDX won't see any copies of that now-reachable
object.

Of course, it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are likely
to get thrown away. But excluding that pack does open us up to the above
race.

This patch demonstrates the bug, and resolves it by including cruft
packs in the MIDX even when doing a geometric repack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
72263ffc32 builtin/repack.c: use named flags for existing_packs
We use the `util` pointer for items in the `existing_packs` string list
to indicate which packs are going to be deleted. Since that has so far
been the only use of that `util` pointer, we just set it to 0 or 1.

But we're going to add an additional state to this field in the next
patch, so prepare for that by adding a #define for the first bit so we
can more expressively inspect the flags state.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
4571324b99 builtin/repack.c: allow configuring cruft pack generation
In servers which set the pack.window configuration to a large value, we
can wind up spending quite a lot of time finding new bases when breaking
delta chains between reachable and unreachable objects while generating
a cruft pack.

Introduce a handful of `repack.cruft*` configuration variables to
control the parameters used by pack-objects when generating a cruft
pack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
f9825d1cf7 builtin/repack.c: support generating a cruft pack
Expose a way to split the contents of a repository into a main and cruft
pack when doing an all-into-one repack with `git repack --cruft -d`, and
a complementary configuration variable.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
a7d493833f builtin/pack-objects.c: --cruft with expiration
In a previous patch, pack-objects learned how to generate a cruft pack
so long as no objects are dropped.

This patch teaches pack-objects to handle the case where a non-never
`--cruft-expiration` value is passed. This case is slightly more
complicated than before, because we want pack-objects to save
unreachable objects which would have been pruned when there is another
recent (i.e., non-prunable) unreachable object which reaches the other.
We'll call these objects "unreachable but reachable-from-recent".

Here is how pack-objects handles `--cruft-expiration`:

  - Instead of adding all objects outside of the kept pack(s) into the
    packing list, only handle the ones whose mtime is within the grace
    period.

  - Construct a reachability traversal whose tips are the
    unreachable-but-recent objects.

  - Then, walk along that traversal, stopping if we reach an object in
    the kept pack. At each step along the traversal, we add the object
    we are visiting to the packing list.

In the majority of these cases, any object we visit in this traversal
will already be in our packing list. But we will sometimes encounter
reachable-from-recent cruft objects, which we want to retain even if
they aged out of the grace period.

The most subtle point of this process is that we actually don't need to
bother to update the rescued object's mtime. Even though we will write
an .mtimes file with a value that is older than the expiration window,
it will continue to survive cruft repacks so long as any objects which
reach it haven't aged out.

That is, a future repack will also exclude that object from the initial
packing list, only to discover it later on when doing the reachability
traversal.

Finally, stopping early once an object is found in a kept pack is safe
to do because the kept packs ordinarily represent which packs will
survive after repacking. Assuming that it _isn't_ safe to halt a
traversal early would mean that there is some ancestor object which is
missing, which implies repository corruption (i.e., the complete set of
reachable objects isn't present).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
2fb90409b8 reachable: add options to add_unseen_recent_objects_to_traversal
This function behaves very similarly to what we will need in
pack-objects in order to implement cruft packs with expiration. But it
is lacking a couple of things. Namely, it needs:

  - a mechanism to communicate the timestamps of individual recent
    objects to some external caller

  - and, in the case of packed objects, our future caller will also want
    to know the originating pack, as well as the offset within that pack
    at which the object can be found

  - finally, it needs a way to skip over packs which are marked as kept
    in-core.

To address the first two, add a callback interface in this patch which
reports the time of each recent object, as well as a (packed_git,
off_t) pair for packed objects.

Likewise, add a new option to the packed object iterators to skip over
packs which are marked as kept in core. This option will become
implicitly tested in a future patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
b757353676 builtin/pack-objects.c: --cruft without expiration
Teach `pack-objects` how to generate a cruft pack when no objects are
dropped (i.e., `--cruft-expiration=never`). Later patches will teach
`pack-objects` how to generate a cruft pack that prunes objects.

When generating a cruft pack which does not prune objects, we want to
collect all unreachable objects into a single pack (noting and updating
their mtimes as we accumulate them). Ordinary use will pass the result
of a `git repack -A` as a kept pack, so when this patch says "kept
pack", readers should think "reachable objects".

Generating a non-expiring cruft packs works as follows:

  - Callers provide a list of every pack they know about, and indicate
    which packs are about to be removed.

  - All packs which are going to be removed (we'll call these the
    redundant ones) are marked as kept in-core.

    Any packs the caller did not mention (but are known to the
    `pack-objects` process) are also marked as kept in-core. Packs not
    mentioned by the caller are assumed to be unknown to them, i.e.,
    they entered the repository after the caller decided which packs
    should be kept and which should be discarded.

    Since we do not want to include objects in these "unknown" packs
    (because we don't know which of their objects are or aren't
    reachable), these are also marked as kept in-core.

  - Then, we enumerate all objects in the repository, and add them to
    our packing list if they do not appear in an in-core kept pack.

This results in a new cruft pack which contains all known objects that
aren't included in the kept packs. When the kept pack is the result of
`git repack -A`, the resulting pack contains all unreachable objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
fa23090b0c builtin/pack-objects.c: return from create_object_entry()
A new caller in the next commit will want to immediately modify the
object_entry structure created by create_object_entry(). Instead of
forcing that caller to wastefully look-up the entry we just created,
return it from create_object_entry() instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
1c573cdd72 pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
This structure will be used to communicate the per-object mtimes when
writing a cruft pack. Here, we need the full packing_data structure
because the mtime information is stored in an array there, not on the
individual object_entry's themselves (to avoid paying the overhead in
structure width for operations which do not generate a cruft pack).

We haven't passed this information down before because one of the two
callers (in bulk-checkin.c) does not have a packing_data structure at
all. In that case (where no cruft pack will be generated), NULL is
passed instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Taylor Blau
94cd775a6c pack-mtimes: support reading .mtimes files
To store the individual mtimes of objects in a cruft pack, introduce a
new `.mtimes` format that can optionally accompany a single pack in the
repository.

The format is defined in Documentation/technical/pack-format.txt, and
stores a 4-byte network order timestamp for each object in name (index)
order.

This patch prepares for cruft packs by defining the `.mtimes` format,
and introducing a basic API that callers can use to read out individual
mtimes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00