Commit Graph

12598 Commits

Author SHA1 Message Date
Johannes Schindelin
ddd0b3dfd9 credential-cache: handle ECONNREFUSED gracefully (#5329)
I should probably add some tests for this.
2025-03-14 22:30:42 +01:00
Johannes Schindelin
0ac2f22c29 Add experimental 'git survey' builtin (#5174)
This introduces `git survey` to Git for Windows ahead of upstream for
the express purpose of getting the path-based analysis in the hands of
more folks.

The inspiration of this builtin is
[`git-sizer`](https://github.com/github/git-sizer), but since that
command relies on `git cat-file --batch` to get the contents of objects,
it has limits to how much information it can provide.

This is mostly a rewrite of the `git survey` builtin that was introduced
into the `microsoft/git` fork in microsoft/git#667. That version had a
lot more bells and whistles, including an analysis much closer to what
`git-sizer` provides.

The biggest difference in this version is that this one is focused on
using the path-walk API in order to visit batches of objects based on a
common path. This allows identifying, for instance, the path that is
contributing the most to the on-disk size across all versions at that
path.

For example, here are the top ten paths contributing to my local Git
repository (which includes `microsoft/git` and `gitster/git`):

```
TOP FILES BY DISK SIZE
============================================================================
                                    Path | Count | Disk Size | Inflated Size
-----------------------------------------+-------+-----------+--------------
                       whats-cooking.txt |  1373 |  11637459 |      37226854
             t/helper/test-gvfs-protocol |     2 |   6847105 |      17233072
                      git-rebase--helper |     1 |   6027849 |      15269664
                          compat/mingw.c |  6111 |   5194453 |     463466970
             t/helper/test-parse-options |     1 |   3420385 |       8807968
                  t/helper/test-pkt-line |     1 |   3408661 |       8778960
      t/helper/test-dump-untracked-cache |     1 |   3408645 |       8780816
            t/helper/test-dump-fsmonitor |     1 |   3406639 |       8776656
                                po/vi.po |   104 |   1376337 |      51441603
                                po/de.po |   210 |   1360112 |      71198603
```

This kind of analysis has been helpful in identifying the reasons for
growth in a few internal monorepos. Those findings motivated the changes
in #5157 and #5171.

With this early version in Git for Windows, we can expand the reach of
the experimental tool in advance of it being contributed to the upstream
project.

Unfortunately, this will mean that in the next `microsoft/git` rebase,
Jeff Hostetler's version will need to be pulled out since there are
enough conflicts. These conflicts include how tables are stored and
generated, as the version in this PR is slightly more general to allow
for different kinds of data.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-03-14 22:30:41 +01:00
Derrick Stolee
f4ba3c322f Add path walk API and its use in 'git pack-objects' (#5171)
This is a follow up to #5157 as well as motivated by the RFC in
gitgitgadget/git#1786.

We have ways of walking all objects, but it is focused on visiting a
single commit and then expanding the new trees and blobs reachable from
that commit that have not been visited yet. This means that objects
arrive without any locality based on their path.

Add a new "path walk API" that focuses on walking objects in batches
according to their type and path. This will walk all annotated tags, all
commits, all root trees, and then start a depth-first search among all
paths in the repo to collect trees and blobs in batches.

The most important application for this is being fast-tracked to Git for
Windows: `git pack-objects --path-walk`. This application of the path
walk API discovers the objects to pack via this batched walk, and
automatically groups objects that appear at a common path so they can be
checked for delta comparisons.

This use completely avoids any name-hash collisions (even the collisions
that sometimes occur with the new `--full-name-hash` option) and can be
much faster to compute since the first pass of delta calculations does
not waste time on objects that are unlikely to be diffable.

Some statistics are available in the commit messages.
2025-03-14 22:30:41 +01:00
Matthias Aßhauer
021de74c80 credential-cache: handle ECONNREFUSED gracefully
In 245670c (credential-cache: check for windows specific errors, 2021-09-14)
we concluded that on Windows we would always encounter ENETDOWN where we
would expect ECONNREFUSED on POSIX systems, when connecting to unix sockets.
As reported in [1], we do encounter ECONNREFUSED on Windows if the
socket file doesn't exist, but the containing directory does and ENETDOWN if
neither exists. We should handle this case like we do on non-windows systems.

[1] https://github.com/git-for-windows/git/pull/4762#issuecomment-2545498245

This fixes https://github.com/git-for-windows/git/issues/5314

Helped-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-03-14 22:30:29 +01:00
Johannes Schindelin
628589ff5a survey: clearly note the experimental nature in the output
While this command is definitely something we _want_, chances are that
upstreaming this will require substantial changes.

We still want to be able to experiment with this before that, to focus
on what we need out of this command: To assist with diagnosing issues
with large repositories, as well as to help monitoring the growth and
the associated painpoints of such repositories.

To that end, we are about to integrate this command into
`microsoft/git`, to get the tool into the hands of users who need it
most, with the idea to iterate in close collaboration between these
users and the developers familar with Git's internals.

However, we will definitely want to avoid letting anybody have the
impression that this command, its exact inner workings, as well as its
output format, are anywhere close to stable. To make that fact utterly
clear (and thereby protect the freedom to iterate and innovate freely
before upstreaming the command), let's mark its output as experimental
in all-caps, as the first thing we do.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-03-14 22:30:28 +01:00
Derrick Stolee
ace0427f0e survey: add --top=<N> option and config
The 'git survey' builtin provides several detail tables, such as "top
files by on-disk size". The size of these tables defaults to 10,
currently.

Allow the user to specify this number via a new --top=<N> option or the
new survey.top config key.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-03-14 22:30:28 +01:00
Derrick Stolee
65fef5758a survey: add report of "largest" paths
Since we are already walking our reachable objects using the path-walk API,
let's now collect lists of the paths that contribute most to different
metrics. Specifically, we care about

 * Number of versions.
 * Total size on disk.
 * Total inflated size (no delta or zlib compression).

This information can be critical to discovering which parts of the
repository are causing the most growth, especially on-disk size. Different
packing strategies might help compress data more efficiently, but the toal
inflated size is a representation of the raw size of all snapshots of those
paths. Even when stored efficiently on disk, that size represents how much
information must be processed to complete a command such as 'git blame'.

The exact disk size seems to be not quite robust enough for testing, as
could be seen by the `linux-musl-meson` job consistently failing, possibly
because of zlib-ng deflates differently: t8100.4(git survey
(default)) was failing with a symptom like this:

   TOTAL OBJECT SIZES BY TYPE
   ===============================================
   Object Type | Count | Disk Size | Inflated Size
   ------------+-------+-----------+--------------
  -    Commits |    10 |      1523 |          2153
  +    Commits |    10 |      1528 |          2153
         Trees |    10 |       495 |          1706
         Blobs |    10 |       191 |           101
  -       Tags |     4 |       510 |           528
  +       Tags |     4 |       547 |           528

This means: the disk size is unlikely something we can verify robustly.
Since zlib-ng seems to increase the disk size of the tags from 528 to
547, we cannot even assume that the disk size is always smaller than the
inflated size. We will most likely want to either skip verifying the
disk size altogether, or go for some kind of fuzzy matching, say, by
replacing `s/ 1[45][0-9][0-9] / ~1.5k /` and `s/ [45][0-9][0-9] / ~½k /`
or something like that.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-03-14 22:30:28 +01:00
Derrick Stolee
c596bba980 survey: add ability to track prioritized lists
In future changes, we will make use of these methods. The intention is to
keep track of the top contributors according to some metric. We don't want
to store all of the entries and do a sort at the end, so track a
constant-size table and remove rows that get pushed out depending on the
chosen sorting algorithm.

Co-authored-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by; Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:28 +01:00
Derrick Stolee
c655a26053 survey: show progress during object walk
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:28 +01:00
Derrick Stolee
994c459580 survey: summarize total sizes by object type
Now that we have explored objects by count, we can expand that a bit more to
summarize the data for the on-disk and inflated size of those objects. This
information is helpful for diagnosing both why disk space (and perhaps
clone or fetch times) is growing but also why certain operations are slow
because the inflated size of the abstract objects that must be processed is
so large.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:28 +01:00
Derrick Stolee
af65f404d7 survey: add object count summary
At the moment, nothing is obvious about the reason for the use of the
path-walk API, but this will become more prevelant in future iterations. For
now, use the path-walk API to sum up the counts of each kind of object.

For example, this is the reachable object summary output for my local repo:

REACHABLE OBJECT SUMMARY
========================
Object Type |  Count
------------+-------
       Tags |   1343
    Commits | 179344
      Trees | 314350
      Blobs | 184030

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:28 +01:00
Derrick Stolee
d5f73ceb0a survey: start pretty printing data in table form
When 'git survey' provides information to the user, this will be presented
in one of two formats: plaintext and JSON. The JSON implementation will be
delayed until the functionality is complete for the plaintext format.

The most important parts of the plaintext format are headers specifying the
different sections of the report and tables providing concreted data.

Create a custom table data structure that allows specifying a list of
strings for the row values. When printing the table, check each column for
the maximum width so we can create a table of the correct size from the
start.

The table structure is designed to be flexible to the different kinds of
output that will be implemented in future changes.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:28 +01:00
Derrick Stolee
ad79f1d91e pack-objects: thread the path-based compression
Adapting the implementation of ll_find_deltas(), create a threaded
version of the --path-walk compression step in 'git pack-objects'.

This involves adding a 'regions' member to the thread_params struct,
allowing each thread to own a section of paths. We can simplify the way
jobs are split because there is no value in extending the batch based on
name-hash the way sections of the object entry array are attempted to be
grouped. We re-use the 'list_size' and 'remaining' items for the purpose
of borrowing work in progress from other "victim" threads when a thread
has finished its batch of work more quickly.

Using the Git repository as a test repo, the p5313 performance test
shows that the resulting size of the repo is the same, but the threaded
implementation gives gains of varying degrees depending on the number of
objects being packed. (This was tested on a 16-core machine.)

Test                                    HEAD~1    HEAD
-------------------------------------------------------------
5313.6: thin pack with --path-walk        0.01    0.01  +0.0%
5313.7: thin pack size with --path-walk    475     475  +0.0%
5313.12: big pack with --path-walk        1.99    1.87  -6.0%
5313.13: big pack size with --path-walk  14.4M   14.3M  -0.4%
5313.18: repack with --path-walk         98.14   41.46 -57.8%
5313.19: repack size with --path-walk   197.2M  197.3M  +0.0%

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Jeff Hostetler
0abf509e9f survey: add command line opts to select references
By default we will scan all references in "refs/heads/", "refs/tags/"
and "refs/remotes/".

Add command line opts let the use ask for all refs or a subset of them
and to include a detached HEAD.

Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Derrick Stolee
5debbd0dc4 pack-objects: refactor path-walk delta phase
Previously, the --path-walk option to 'git pack-objects' would compute
deltas inline with the path-walk logic. This would make the progress
indicator look like it is taking a long time to enumerate objects, and
then very quickly computed deltas.

Instead of computing deltas on each region of objects organized by tree,
store a list of regions corresponding to these groups. These can later
be pulled from the list for delta compression before doing the "global"
delta search.

This presents a new progress indicator that can be used in tests to
verify that this stage is happening.

The current implementation is not integrated with threads, but could be
done in a future update.

Since we do not attempt to sort objects by size until after exploring
all trees, we can remove the previous change to t5530 due to a different
error message appearing first.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Jeff Hostetler
60b7fa02de survey: stub in new experimental 'git-survey' command
Start work on a new 'git survey' command to scan the repository
for monorepo performance and scaling problems.  The goal is to
measure the various known "dimensions of scale" and serve as a
foundation for adding additional measurements as we learn more
about Git monorepo scaling problems.

The initial goal is to complement the scanning and analysis performed
by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool.
It is hoped that by creating a builtin command, we may be able to take
advantage of internal Git data structures and code that is not
accessible from GO to gain further insight into potential scaling
problems.

Co-authored-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Derrick Stolee
b74589acc3 pack-objects: enable --path-walk via config
Users may want to enable the --path-walk option for 'git pack-objects' by
default, especially underneath commands like 'git push' or 'git repack'.

This should be limited to client repositories, since the --path-walk option
disables bitmap walks, so would be bad to include in Git servers when
serving fetches and clones. There is potential that it may be helpful to
consider when repacking the repository, to take advantage of improved deltas
across historical versions of the same files.

Much like how "pack.useSparse" was introduced and included in
"feature.experimental" before being enabled by default, use the repository
settings infrastructure to make the new "pack.usePathWalk" config enabled by
"feature.experimental" and "feature.manyFiles".

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Derrick Stolee
ba06d19645 repack: add --path-walk option
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.

Add the --path-walk option to the performance tests in p5313.

For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the results are very interesting:

Test                                           this tree
------------------------------------------------------------------
5313.2: thin pack                              0.40(0.47+0.04)
5313.3: thin pack size                                    1.2M
5313.4: thin pack with --full-name-hash        0.09(0.10+0.04)
5313.5: thin pack size with --full-name-hash             22.8K
5313.6: thin pack with --path-walk             0.08(0.06+0.02)
5313.7: thin pack size with --path-walk                  20.8K
5313.8: big pack                               2.16(8.43+0.23)
5313.9: big pack size                                    17.7M
5313.10: big pack with --full-name-hash        1.42(3.06+0.21)
5313.11: big pack size with --full-name-hash             18.0M
5313.12: big pack with --path-walk             2.21(8.39+0.24)
5313.13: big pack size with --path-walk                  17.8M
5313.14: repack                                98.05(662.37+2.64)
5313.15: repack size                                    449.1K
5313.16: repack with --full-name-hash          33.95(129.44+2.63)
5313.17: repack size with --full-name-hash              182.9K
5313.18: repack with --path-walk               106.21(121.58+0.82)
5313.19: repack size with --path-walk                   159.6K

[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08

This repo suffers from having a lot of paths that collide in the name
hash, so examining them in groups by path leads to better deltas. Also,
in this case, the single-threaded implementation is competitive with the
full repack. This is saving time diffing files that have significant
differences from each other.

A similar, but private, repo has even more extremes in the thin packs:

Test                                           this tree
--------------------------------------------------------------
5313.2: thin pack                              2.39(2.91+0.10)
5313.3: thin pack size                                    4.5M
5313.4: thin pack with --full-name-hash        0.29(0.47+0.12)
5313.5: thin pack size with --full-name-hash             15.5K
5313.6: thin pack with --path-walk             0.35(0.31+0.04)
5313.7: thin pack size with --path-walk                  14.2K

Notice, however, that while the --full-name-hash version is working
quite well in these cases for the thin pack, it does poorly for some
other standard cases, such as this test on the Linux kernel repository:

Test                                           this tree
--------------------------------------------------------------
5313.2: thin pack                              0.01(0.00+0.00)
5313.3: thin pack size                                     310
5313.4: thin pack with --full-name-hash        0.00(0.00+0.00)
5313.5: thin pack size with --full-name-hash              1.4K
5313.6: thin pack with --path-walk             0.00(0.00+0.00)
5313.7: thin pack size with --path-walk                    310

Here, the --full-name-hash option does much worse than the default name
hash, but the path-walk option does exactly as well.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Derrick Stolee
9388ecb6ac pack-objects: introduce GIT_TEST_PACK_PATH_WALK
There are many tests that validate whether 'git pack-objects' works as
expected. Instead of duplicating these tests, add a new test environment
variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default
when specified.

This was useful in testing the implementation of the --path-walk
implementation, especially in conjunction with test such as:

 - t0411-clone-from-partial.sh : One test fetches from a repo that does
   not have the boundary objects. This causes the path-based walk to
   fail. Disable the variable for this test.

 - t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo
   without a boundary object.

 - t5310-pack-bitmaps.sh : One test compares the case when packing with
   bitmaps to the case when packing without them. Since we disable the
   test variable when writing bitmaps, this causes a difference in the
   object list (the --path-walk option adds an extra object). Specify
   --no-path-walk in both processes for the comparison. Another test
   checks for a specific delta base, but when computing dynamically
   without using bitmaps, the base object it too small to be considered
   in the delta calculations so no base is used.

 - t5316-pack-delta-depth.sh : This script cares about certain delta
   choices and their chain lengths. The --path-walk option changes how
   these chains are selected, and thus changes the results of this test.

 - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of
   the --sparse option and how it combines with --path-walk.

 - t5332-multi-pack-reuse.sh : This test verifies that the preferred
   pack is used for delta reuse when possible. The --path-walk option is
   not currently aware of the preferred pack at all, so finds a
   different delta base.

 - t7406-submodule-update.sh : When using the variable, the --depth
   option collides with the --path-walk feature, resulting in a warning
   message. Disable the variable so this warning does not appear.

I want to call out one specific test change that is only temporary:

 - t5530-upload-pack-error.sh : One test cares specifically about an
   "unable to read" error message. Since the current implementation
   performs delta calculations within the path-walk API callback, a
   different "unable to get size" error message appears. When this
   is changed in a future refactoring, this test change can be reverted.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Derrick Stolee
f9dec4144a pack-objects: add --path-walk option
In order to more easily compute delta bases among objects that appear at the
exact same path, add a --path-walk option to 'git pack-objects'.

This option will use the path-walk API instead of the object walk given by
the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.

The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.

After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.

RFC TODO: It is important to note that this option is inherently
incompatible with using a bitmap index. This walk probably also does not
work with other advanced features, such as delta islands.

Getting ahead of myself, this option compares well with --full-name-hash
when the packfile is large enough, but also performs at least as well as
the default in all cases that I've seen.

RFC TODO: this should probably be recording the batch locations to another
list so they could be processed in a second phase using threads.

RFC TODO: list some examples of how this outperforms previous pack-objects
strategies. (This is coming in later commits that include performance
test changes.)

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Derrick Stolee
977c6d6744 pack-objects: extract should_attempt_deltas()
This will be helpful in a future change.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-03-14 22:30:27 +01:00
Johannes Schindelin
0baebe3ce9 clean: remove mount points when possible
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-03-14 22:30:17 +01:00
Johannes Schindelin
03560065f0 clean: do not traverse mount points
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.

In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".

Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).

This addresses https://github.com/git-for-windows/git/issues/607

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-03-14 22:30:17 +01:00
Junio C Hamano
5d55ad01f5 Merge branch 'tb/fetch-follow-tags-fix'
* tb/fetch-follow-tags-fix:
  fetch: fix following tags when fetching specific OID
2025-03-10 08:45:58 -07:00
Taylor Blau
bd52d9a058 fetch: fix following tags when fetching specific OID
In 3f763ddf28 (fetch: set remote/HEAD if it does not exist, 2024-11-22),
unconditionally adds "HEAD" to the list of ref prefixes we send to the
server.

This breaks a core assumption that the list of prefixes we send to the
server is complete. We must either send all prefixes we care about, or
none at all (in the latter case the server then advertises everything).

The tag following code is careful to only add "refs/tags/" to the list
of prefixes if there are already entries in the prefix list. But because
the new code from 3f763ddf28 runs after the tag code, and because it
unconditionally adds to the prefix list, we may end up with a prefix
list that _should_ have "refs/tags/" in it, but doesn't.

When that is the case, the server does not advertise any tags, and our
auto-following breaks because we never learned about any tags in the
first place.

Fix this by only adding "HEAD" to the ref prefixes when we know that we
are already limiting the advertisement. In either case we'll learn about
HEAD (either through the limited advertisement, or implicitly through a
full advertisement).

Reported-by: Igor Todorovski <itodorov@ca.ibm.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-07 16:15:18 -08:00
Junio C Hamano
cdf458c60e Merge branch 'kn/ref-migrate-skip-reflog'
Usage string of "git refs" has been corrected.

* kn/ref-migrate-skip-reflog:
  refs: show --no-reflog in the help text
2025-03-05 10:37:45 -08:00
Junio C Hamano
feffb34257 Merge branch 'ps/path-sans-the-repository'
The path.[ch] API takes an explicit repository parameter passed
throughout the callchain, instead of relying on the_repository
singleton instance.

* ps/path-sans-the-repository:
  path: adjust last remaining users of `the_repository`
  environment: move access to "core.sharedRepository" into repo settings
  environment: move access to "core.hooksPath" into repo settings
  repo-settings: introduce function to clear struct
  path: drop `git_path()` in favor of `repo_git_path()`
  rerere: let `rerere_path()` write paths into a caller-provided buffer
  path: drop `git_common_path()` in favor of `repo_common_path()`
  worktree: return allocated string from `get_worktree_git_dir()`
  path: drop `git_path_buf()` in favor of `repo_git_path_replace()`
  path: drop `git_pathdup()` in favor of `repo_git_path()`
  path: drop unused `strbuf_git_path()` function
  path: refactor `repo_submodule_path()` family of functions
  submodule: refactor `submodule_to_gitdir()` to accept a repo
  path: refactor `repo_worktree_path()` family of functions
  path: refactor `repo_git_path()` family of functions
  path: refactor `repo_common_path()` family of functions
2025-03-05 10:37:43 -08:00
Junio C Hamano
6dff5de1da refs: show --no-reflog in the help text
We forgot that we must keep the documentation and help text in sync.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-03 14:51:29 -08:00
Patrick Steinhardt
028f618658 path: adjust last remaining users of the_repository
With the preceding refactorings we now only have a couple of implicit
users of `the_repository` left in the "path" subsystem, all of which
depend on global state via `calc_shared_perm()`. Make the dependency on
`the_repository` explicit by passing the repo as a parameter instead and
adjust callers accordingly.

Note that this change bubbles up into a couple of subsystems that were
previously declared as free from `the_repository`. Instead of marking
all of them as `the_repository`-dependent again, we instead use the
repository that is available in the calling context. There are three
exceptions though with "copy.c", "pack-write.c" and "tempfile.c".
Adjusting these would require us to adapt callsites all over the place,
so this is left for a future iteration.

Mark "path.c" as free from `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28 13:54:11 -08:00
Patrick Steinhardt
f1ce861c34 environment: move access to "core.sharedRepository" into repo settings
Similar as with the preceding commit, we track "core.sharedRepository"
via a pair of global variables. Move them into `struct repo_settings` so
that we can instead track them per-repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28 13:54:11 -08:00
Patrick Steinhardt
88dd321cfe path: drop git_path() in favor of repo_git_path()
Remove `git_path()` in favor of the `repo_git_path()` family of
functions, which makes the implicit dependency on `the_repository` go
away.

Note that `git_path()` returned a string allocated via `get_pathname()`,
which uses a rotating set of statically allocated buffers. Consequently,
callers didn't have to free the returned string. The same isn't true for
`repo_common_path()`, so we also have to add logic to free the returned
strings.

This refactoring also allows us to remove `repo_common_pathv()` as well
as `get_pathname()` from the public interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28 13:54:11 -08:00
Patrick Steinhardt
8ee018d863 rerere: let rerere_path() write paths into a caller-provided buffer
Same as with `get_worktree_git_dir()` a couple of commits ago, the
`rerere_path()` function returns paths that need not be free'd by the
caller because `git_path()` internally uses `get_pathname()`.

Refactor the function to instead accept a caller-provided buffer that
the path will be written into, passing on ownership to the caller. This
refactoring prepares us for the removal of `git_path()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-28 13:54:11 -08:00
Junio C Hamano
3c0f4abaf5 Merge branch 'kn/ref-migrate-skip-reflog'
"git refs migrate" can optionally be told not to migrate the reflog.

* kn/ref-migrate-skip-reflog:
  builtin/refs: add '--no-reflog' flag to drop reflogs
2025-02-27 15:23:00 -08:00
Junio C Hamano
9d8cce051a Merge branch 'ua/os-version-capability'
The value of "uname -s" is by default sent over the wire as a part
of the "version" capability.

* ua/os-version-capability:
  agent: advertise OS name via agent capability
  t5701: add setup test to remove side-effect dependency
  version: extend get_uname_info() to hide system details
  version: refactor get_uname_info()
  version: refactor redact_non_printables()
  version: replace manual ASCII checks with isprint() for clarity
2025-02-27 15:23:00 -08:00
Junio C Hamano
e24570b0a3 Merge branch 'jk/check-mailmap-wo-name-fix'
"git check-mailmap" segfault fix.

* jk/check-mailmap-wo-name-fix:
  mailmap: fix check-mailmap with full mailmap line
2025-02-26 08:51:00 -08:00
Junio C Hamano
9b07c152df Merge branch 'pw/merge-tree-stdin-deadlock-fix'
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).

* pw/merge-tree-stdin-deadlock-fix:
  merge-tree: fix link formatting in html docs
  merge-tree: improve docs for --stdin
  merge-tree: only use basic merge config
  merge-tree: remove redundant code
  merge-tree --stdin: flush stdout to avoid deadlock
2025-02-25 14:19:36 -08:00
Jacob Keller
bb60c52131 mailmap: fix check-mailmap with full mailmap line
I recently had reported to me a crash from a coworker using the recently
added sendemail mailmap support:

  3724814 Segmentation fault      (core dumped) git check-mailmap "bugs@company.xx"

This appears to happen because of the NULL pointer name passed into
map_user(). Fix this by passing "" instead of NULL so that we have a
valid pointer.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-21 18:27:16 -08:00
Junio C Hamano
ee8020ff40 Merge branch 'ua/update-server-info-sans-the-repository'
Code clean-up.

* ua/update-server-info-sans-the-repository:
  builtin/update-server-info: remove the_repository global variable
2025-02-21 10:35:53 -08:00
Karthik Nayak
89be7d2774 builtin/refs: add '--no-reflog' flag to drop reflogs
The "git refs migrate" subcommand converts the backend used for ref
storage. It always migrates reflog data as well as refs. Introduce an
option to exclude reflogs from migration, allowing them to be discarded
when they are unnecessary.

This is particularly useful in server-side repositories, where reflogs
are typically not expected. However, some repositories may still have
them due to historical reasons, such as bugs, misconfigurations, or
administrative decisions to enable reflogs for debugging. In such
repositories, it would be optimal to drop reflogs during the migration.

To address this, introduce the '--no-reflog' flag, which prevents reflog
migration. When this flag is used, reflogs from the original reference
backend are migrated. Since only the new reference backend remains in
the repository, all previous reflogs are permanently discarded.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-21 09:55:02 -08:00
Junio C Hamano
716b00e6e9 Merge branch 'da/difftool-sans-the-repository'
"git difftool" code clean-up.

* da/difftool-sans-the-repository:
  difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE
  difftool: eliminate use of the_repository
  difftool: eliminate use of global variables
2025-02-18 15:30:32 -08:00
Junio C Hamano
7722b997c6 Merge branch 'jt/rev-list-missing-print-info'
"git rev-list --missing=" learned to accept "print-info" that gives
known details expected of the missing objects, like path and type.

* jt/rev-list-missing-print-info:
  rev-list: extend print-info to print missing object type
  rev-list: add print-info action to print missing object path
2025-02-18 15:30:32 -08:00
Junio C Hamano
e565f37553 Merge branch 'ds/backfill'
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time.  "git backfill" is introduced
to help bulk-download necessary files beforehand.

* ds/backfill:
  backfill: assume --sparse when sparse-checkout is enabled
  backfill: add --sparse option
  backfill: add --min-batch-size=<n> option
  backfill: basic functionality and tests
  backfill: add builtin boilerplate
2025-02-18 15:30:31 -08:00
Phillip Wood
54cf5d2da8 merge-tree: only use basic merge config
Commit 9c93ba4d0a (merge-recursive: honor diff.algorithm, 2024-07-13)
replaced init_merge_options() with init_basic_merge_config() for use in
plumbing commands and init_ui_merge_config() for use in porcelain
commands. As "git merge-tree" is a plumbing command it should call
init_basic_merge_config() rather than init_ui_merge_config(). The merge
ort machinery ignores "diff.algorithm" so the behavior is unchanged by
this commit but it future proofs us against any future changes to
init_ui_merge_config().

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:52:39 -08:00
Phillip Wood
1b0e5f4499 merge-tree: remove redundant code
real_merge() only ever returns "0" or "1" as it dies if the merge status
is less than zero. Therefore the check for "result < 0" is redundant and
the result variable is not needed. The return value of real_merge() is
ignored because exit status of "git merge-tree --stdin" is "0" for both
successful and conflicted merges (the status of each merge is written to
stdout). The return type of real_merge() is not changed as it is used
for the program's exit status when "--stdin" is not given.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:52:39 -08:00
Phillip Wood
344a107b55 merge-tree --stdin: flush stdout to avoid deadlock
If a process tries to read the output from "git merge-tree --stdin"
before it closes merge-tree's stdin then it deadlocks. This happens
because merge-tree does not flush its output before trying to read
another line of input and means that it is not possible to cherry-pick a
sequence of commits using "git merge-tree --stdin". Fix this by calling
maybe_flush_or_die() before trying to read the next line of
input. Flushing the output after each merge does not seem to affect the
performance, any difference is lost in the noise even after increasing
the number of runs.

$ git rev-list --merges --parents -n100 origin/master |
	sed 's/^[^ ]* //' >/tmp/merges
$ hyperfine -L flush 0,1 --warmup 1 --runs 30 \
	'GIT_FLUSH={flush} ./git merge-tree --stdin </tmp/merges'
Benchmark 1: GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges
  Time (mean ± σ):     546.6 ms ±  11.7 ms    [User: 503.2 ms, System: 40.9 ms]
  Range (min … max):   535.9 ms … 567.7 ms    30 runs

Benchmark 2: GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges
  Time (mean ± σ):     546.9 ms ±  12.0 ms    [User: 505.9 ms, System: 38.9 ms]
  Range (min … max):   529.8 ms … 570.0 ms    30 runs

Summary
  'GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges' ran
    1.00 ± 0.03 times faster than 'GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges'

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:52:39 -08:00
Usman Akinyemi
6aa09fd872 version: extend get_uname_info() to hide system details
Currently, get_uname_info() function provides the full OS information.
In a following commit, we will need it to provide only the OS name.

Let's extend it to accept a "full" flag that makes it switch between
providing full OS information and providing only the OS name.

We may need to refactor this function in the future if an
`osVersion.format` is added.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:05:12 -08:00
Usman Akinyemi
0a78d61247 version: refactor get_uname_info()
Some code from "builtin/bugreport.c" uses uname(2) to get system
information.

Let's refactor this code into a new get_uname_info() function, so
that we can reuse it in a following commit.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18 09:05:12 -08:00
Junio C Hamano
c3fffcfe8e Merge branch 'bf/fetch-set-head-fix'
Fetching into a bare repository incorrectly assumed it always used
a mirror layout when deciding to update remote-tracking HEAD, which
has been corrected.

* bf/fetch-set-head-fix:
  fetch set_head: fix non-mirror remotes in bare repositories
  fetch set_head: refactor to use remote directly
2025-02-14 17:53:48 -08:00
Junio C Hamano
5785d9143b Merge branch 'tc/clone-single-revision'
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.

* tc/clone-single-revision:
  builtin/clone: teach git-clone(1) the --revision= option
  parse-options: introduce die_for_incompatible_opt2()
  clone: introduce struct clone_opts in builtin/clone.c
  clone: add tags refspec earlier to fetch refspec
  clone: refactor wanted_peer_refs()
  clone: make it possible to specify --tags
  clone: cut down on global variables in clone.c
2025-02-14 17:53:48 -08:00
Junio C Hamano
998c5f0c75 Merge branch 'ms/refspec-cleanup'
Code clean-up.  cf. <Z6G-toOJjMmK8iJG@pks.im>

* ms/refspec-cleanup:
  refspec: relocate apply_refspecs and related funtions
  refspec: relocate matching related functions
  remote: rename query_refspecs functions
  refspec: relocate refname_matches_negative_refspec_item
  remote: rename function omit_name_by_refspec
2025-02-12 10:08:54 -08:00