There has been a slow but steady stream of bug reports triggered by the
warning included in ac33519ddf. Since
introducing the escape hatch for Windows 7 in that same patch, however,
I have not seen a single report that pointed to the kind of bug I
anticipated when writing that warning message.
Let's drop this warning, as well as the escape hatch: Git for Windows
dropped supporting Windows 7 (and Windows 8) directly after Git for
Windows v2.46.2. For full details, see
https://gitforwindows.org/requirements#windows-version.
For good measure, let's keep the fall-back, though: It sometimes helps
work around issues (e.g. when Defender still holds a handle on a file
that Git wants to write out).
This introduces `git survey` to Git for Windows ahead of upstream for
the express purpose of getting the path-based analysis in the hands of
more folks.
The inspiration of this builtin is
[`git-sizer`](https://github.com/github/git-sizer), but since that
command relies on `git cat-file --batch` to get the contents of objects,
it has limits to how much information it can provide.
This is mostly a rewrite of the `git survey` builtin that was introduced
into the `microsoft/git` fork in microsoft/git#667. That version had a
lot more bells and whistles, including an analysis much closer to what
`git-sizer` provides.
The biggest difference in this version is that this one is focused on
using the path-walk API in order to visit batches of objects based on a
common path. This allows identifying, for instance, the path that is
contributing the most to the on-disk size across all versions at that
path.
For example, here are the top ten paths contributing to my local Git
repository (which includes `microsoft/git` and `gitster/git`):
```
TOP FILES BY DISK SIZE
============================================================================
Path | Count | Disk Size | Inflated Size
-----------------------------------------+-------+-----------+--------------
whats-cooking.txt | 1373 | 11637459 | 37226854
t/helper/test-gvfs-protocol | 2 | 6847105 | 17233072
git-rebase--helper | 1 | 6027849 | 15269664
compat/mingw.c | 6111 | 5194453 | 463466970
t/helper/test-parse-options | 1 | 3420385 | 8807968
t/helper/test-pkt-line | 1 | 3408661 | 8778960
t/helper/test-dump-untracked-cache | 1 | 3408645 | 8780816
t/helper/test-dump-fsmonitor | 1 | 3406639 | 8776656
po/vi.po | 104 | 1376337 | 51441603
po/de.po | 210 | 1360112 | 71198603
```
This kind of analysis has been helpful in identifying the reasons for
growth in a few internal monorepos. Those findings motivated the changes
in #5157 and #5171.
With this early version in Git for Windows, we can expand the reach of
the experimental tool in advance of it being contributed to the upstream
project.
Unfortunately, this will mean that in the next `microsoft/git` rebase,
Jeff Hostetler's version will need to be pulled out since there are
enough conflicts. These conflicts include how tables are stored and
generated, as the version in this PR is slightly more general to allow
for different kinds of data.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is a follow up to #5157 as well as motivated by the RFC in
gitgitgadget/git#1786.
We have ways of walking all objects, but it is focused on visiting a
single commit and then expanding the new trees and blobs reachable from
that commit that have not been visited yet. This means that objects
arrive without any locality based on their path.
Add a new "path walk API" that focuses on walking objects in batches
according to their type and path. This will walk all annotated tags, all
commits, all root trees, and then start a depth-first search among all
paths in the repo to collect trees and blobs in batches.
The most important application for this is being fast-tracked to Git for
Windows: `git pack-objects --path-walk`. This application of the path
walk API discovers the objects to pack via this batched walk, and
automatically groups objects that appear at a common path so they can be
checked for delta comparisons.
This use completely avoids any name-hash collisions (even the collisions
that sometimes occur with the new `--full-name-hash` option) and can be
much faster to compute since the first pass of delta calculations does
not waste time on objects that are unlikely to be diffable.
Some statistics are available in the commit messages.
In ac33519ddf (mingw: restrict file handle inheritance only on Windows
7 and later, 2019-11-22), I introduced code to safe-guard the
defense-in-depth handling that restricts handles' inheritance so that it
would work with Windows 7, too.
Let's revert this patch: Git for Windows dropped supporting Windows 7 (and
Windows 8) directly after Git for Windows v2.46.2. For full details, see
https://gitforwindows.org/requirements#windows-version.
Actually, on second thought: revert only the part that makes this handle
inheritance restriction logic optional and that suggests to open a bug
report if it fails, but keep the fall-back to try again without said
logic: There have been a few false positives over the past few years
(where the warning was triggered e.g. because Defender was still
accessing a file that Git wanted to overwrite), and the fall-back logic
seems to have helped occasionally in such situations.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The 'git survey' builtin provides several detail tables, such as "top
files by on-disk size". The size of these tables defaults to 10,
currently.
Allow the user to specify this number via a new --top=<N> option or the
new survey.top config key.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
At the moment, nothing is obvious about the reason for the use of the
path-walk API, but this will become more prevelant in future iterations. For
now, use the path-walk API to sum up the counts of each kind of object.
For example, this is the reachable object summary output for my local repo:
REACHABLE OBJECT SUMMARY
========================
Object Type | Count
------------+-------
Tags | 1343
Commits | 179344
Trees | 314350
Blobs | 184030
Signed-off-by: Derrick Stolee <stolee@gmail.com>
When 'git survey' provides information to the user, this will be presented
in one of two formats: plaintext and JSON. The JSON implementation will be
delayed until the functionality is complete for the plaintext format.
The most important parts of the plaintext format are headers specifying the
different sections of the report and tables providing concreted data.
Create a custom table data structure that allows specifying a list of
strings for the row values. When printing the table, check each column for
the maximum width so we can create a table of the correct size from the
start.
The table structure is designed to be flexible to the different kinds of
output that will be implemented in future changes.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
By default we will scan all references in "refs/heads/", "refs/tags/"
and "refs/remotes/".
Add command line opts let the use ask for all refs or a subset of them
and to include a detached HEAD.
Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Start work on a new 'git survey' command to scan the repository
for monorepo performance and scaling problems. The goal is to
measure the various known "dimensions of scale" and serve as a
foundation for adding additional measurements as we learn more
about Git monorepo scaling problems.
The initial goal is to complement the scanning and analysis performed
by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool.
It is hoped that by creating a builtin command, we may be able to take
advantage of internal Git data structures and code that is not
accessible from GO to gain further insight into potential scaling
problems.
Co-authored-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Users may want to enable the --path-walk option for 'git pack-objects' by
default, especially underneath commands like 'git push' or 'git repack'.
This should be limited to client repositories, since the --path-walk option
disables bitmap walks, so would be bad to include in Git servers when
serving fetches and clones. There is potential that it may be helpful to
consider when repacking the repository, to take advantage of improved deltas
across historical versions of the same files.
Much like how "pack.useSparse" was introduced and included in
"feature.experimental" before being enabled by default, use the repository
settings infrastructure to make the new "pack.usePathWalk" config enabled by
"feature.experimental" and "feature.manyFiles".
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.
Add the --path-walk option to the performance tests in p5313.
For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the results are very interesting:
Test this tree
------------------------------------------------------------------
5313.2: thin pack 0.40(0.47+0.04)
5313.3: thin pack size 1.2M
5313.4: thin pack with --full-name-hash 0.09(0.10+0.04)
5313.5: thin pack size with --full-name-hash 22.8K
5313.6: thin pack with --path-walk 0.08(0.06+0.02)
5313.7: thin pack size with --path-walk 20.8K
5313.8: big pack 2.16(8.43+0.23)
5313.9: big pack size 17.7M
5313.10: big pack with --full-name-hash 1.42(3.06+0.21)
5313.11: big pack size with --full-name-hash 18.0M
5313.12: big pack with --path-walk 2.21(8.39+0.24)
5313.13: big pack size with --path-walk 17.8M
5313.14: repack 98.05(662.37+2.64)
5313.15: repack size 449.1K
5313.16: repack with --full-name-hash 33.95(129.44+2.63)
5313.17: repack size with --full-name-hash 182.9K
5313.18: repack with --path-walk 106.21(121.58+0.82)
5313.19: repack size with --path-walk 159.6K
[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08
This repo suffers from having a lot of paths that collide in the name
hash, so examining them in groups by path leads to better deltas. Also,
in this case, the single-threaded implementation is competitive with the
full repack. This is saving time diffing files that have significant
differences from each other.
A similar, but private, repo has even more extremes in the thin packs:
Test this tree
--------------------------------------------------------------
5313.2: thin pack 2.39(2.91+0.10)
5313.3: thin pack size 4.5M
5313.4: thin pack with --full-name-hash 0.29(0.47+0.12)
5313.5: thin pack size with --full-name-hash 15.5K
5313.6: thin pack with --path-walk 0.35(0.31+0.04)
5313.7: thin pack size with --path-walk 14.2K
Notice, however, that while the --full-name-hash version is working
quite well in these cases for the thin pack, it does poorly for some
other standard cases, such as this test on the Linux kernel repository:
Test this tree
--------------------------------------------------------------
5313.2: thin pack 0.01(0.00+0.00)
5313.3: thin pack size 310
5313.4: thin pack with --full-name-hash 0.00(0.00+0.00)
5313.5: thin pack size with --full-name-hash 1.4K
5313.6: thin pack with --path-walk 0.00(0.00+0.00)
5313.7: thin pack size with --path-walk 310
Here, the --full-name-hash option does much worse than the default name
hash, but the path-walk option does exactly as well.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
In order to more easily compute delta bases among objects that appear at the
exact same path, add a --path-walk option to 'git pack-objects'.
This option will use the path-walk API instead of the object walk given by
the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.
The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.
After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.
RFC TODO: It is important to note that this option is inherently
incompatible with using a bitmap index. This walk probably also does not
work with other advanced features, such as delta islands.
Getting ahead of myself, this option compares well with --full-name-hash
when the packfile is large enough, but also performs at least as well as
the default in all cases that I've seen.
RFC TODO: this should probably be recording the batch locations to another
list so they could be processed in a second phase using threads.
RFC TODO: list some examples of how this outperforms previous pack-objects
strategies. (This is coming in later commits that include performance
test changes.)
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Atomic append on windows is only supported on local disk files, and it may
cause errors in other situations, e.g. network file system. If that is the
case, this config option should be used to turn atomic append off.
Co-Authored-By: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: 孙卓识 <sunzhuoshi@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This adds support for a new http.sslAutoClientCert config value.
In cURL 7.77 or later the schannel backend does not automatically send
client certificates from the Windows Certificate Store anymore.
This config value is only used if http.sslBackend is set to "schannel",
and can be used to opt in to the old behavior and force cURL to send
client certificates.
This fixes https://github.com/git-for-windows/git/issues/3292
Signed-off-by: Pascal Muller <pascalmuller@gmail.com>
The native Windows HTTPS backend is based on Secure Channel which lets
the caller decide how to handle revocation checking problems caused by
missing information in the certificate or offline CRL distribution
points.
Unfortunately, cURL chose to handle these problems differently than
OpenSSL by default: while OpenSSL happily ignores those problems
(essentially saying "¯\_(ツ)_/¯"), the Secure Channel backend will error
out instead.
As a remedy, the "no revoke" mode was introduced, which turns off
revocation checking altogether. This is a bit heavy-handed. We support
this via the `http.schannelCheckRevoke` setting.
In https://github.com/curl/curl/pull/4981, we contributed an opt-in
"best effort" strategy that emulates what OpenSSL seems to do.
In Git for Windows, we actually want this to be the default. This patch
makes it so, introducing it as a new value for the
`http.schannelCheckRevoke" setting, which now becmes a tristate: it
accepts the values "false", "true" or "best-effort" (defaulting to the
last one).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Since commit 0c499ea60f (send-pack: demultiplex a sideband stream with
status data, 2010-02-05) the send-pack builtin uses the side-band-64k
capability if advertised by the server.
Unfortunately this breaks pushing over the dump git protocol if used
over a network connection.
The detailed reasons for this breakage are (by courtesy of Jeff Preshing,
quoted from https://groups.google.com/d/msg/msysgit/at8D7J-h7mw/eaLujILGUWoJ):
MinGW wraps Windows sockets in CRT file descriptors in order to
mimic the functionality of POSIX sockets. This causes msvcrt.dll
to treat sockets as Installable File System (IFS) handles,
calling ReadFile, WriteFile, DuplicateHandle and CloseHandle on
them. This approach works well in simple cases on recent
versions of Windows, but does not support all usage patterns. In
particular, using this approach, any attempt to read & write
concurrently on the same socket (from one or more processes)
will deadlock in a scenario where the read waits for a response
from the server which is only invoked after the write. This is
what send_pack currently attempts to do in the use_sideband
codepath.
The new config option `sendpack.sideband` allows to override the
side-band-64k capability of the server, and thus makes the dumb git
protocol work.
Other transportation methods like ssh and http/https still benefit from
the sideband channel, therefore the default value of `sendpack.sideband`
is still true.
Signed-off-by: Thomas Braun <thomas.braun@byte-physics.de>
Signed-off-by: Oliver Schneider <oliver@assarbad.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding two commits introduced special handling of the sideband
channel to neutralize ANSI escape sequences before sending the payload
to the terminal, and `sideband.allowControlCharacters` to override that
behavior.
However, some `pre-receive` hooks that are actively used in practice
want to color their messages and therefore rely on the fact that Git
passes them through to the terminal.
In contrast to other ANSI escape sequences, it is highly unlikely that
coloring sequences can be essential tools in attack vectors that mislead
Git users e.g. by hiding crucial information.
Therefore we can have both: Continue to allow ANSI coloring sequences to
be passed to the terminal, and neutralize all other ANSI escape
sequences.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding commit fixed the vulnerability whereas sideband messages
(that are under the control of the remote server) could contain ANSI
escape sequences that would be sent to the terminal verbatim.
However, this fix may not be desirable under all circumstances, e.g.
when remote servers deliberately add coloring to their messages to
increase their urgency.
To help with those use cases, give users a way to opt-out of the
protections: `sideband.allowControlCharacters`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Commit bc26f7690a (clone: make it possible to specify --tags,
2025-02-06) added a new paragraph in the middle of this list item. By
adding an empty line rather than using a list continuation, we broke the
list continuation, with the new paragraph ending up funnily indented.
Restore the chain of list continuations.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some future breaking changes would remove certain parts of the
default repository, which were still described even when the
documents were built for the future with WITH_BREAKING_CHANGES.
* pw/repo-layout-doc-update:
docs: fix repository-layout when building with breaking changes
Meson-based build procedure forgot to build some docs, which has
been corrected.
* pw/build-meson-technical-and-howto-docs:
meson: fix building technical and howto docs
Since commit 8ccc75c245 (remote: announce removal of "branches/" and
"remotes/", 2025-01-22) enabling WITH_BREAKING_CHANGES when building git
removes support for reading branches from ".git/branches" and remotes
from ".git/remotes". However those locations are still documented in
gitrepository-layout.adoc even though the build does not support them.
Rectify this by adding a new document attribute "with-breaking-changes"
and use it to make the inclusion of those sections of the documentation
conditional. Note that the name of the attribute does not match the test
prerequisite WITHOUT_BREAKING_CHANGES added in c5bc9a7f94 (Makefile:
wire up build option for deprecated features, 2025-01-22). This is to
avoid the awkward double negative ifndef::without_breaking_changes for
documentation that should be included when WITH_BREAKING_CHANGES is
enabled. The test prerequisite will be renamed to match the
documentation attribute in a future patch series.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update a few more instances of Documentation/*.txt files which have been
renamed to *.adoc.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit ec14d4ecb5 (builtin.h: take over documentation from
api-builtin.txt, 2017-08-02) deleted api-builtin.txt and moved the
contents into builtin.h. Most of the references were fixed in
d85e9448dd (new-command.txt: update reference to builtin docs,
2023-02-04), but one remained. Fix it.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The top-level .gitattributes file contains entries for the Documentation
tree. Documentation/.gitattributes has not been touched since it was
added in 14f9e128d3 (Define the project whitespace policy, 2008-02-10).
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point behind a compile-time switch is to ensure that we have a
mechanism to hide myriad of backward incompatible changes that may
be prepared and accumulated over time, yet make them available for
testing any time during the development toward the big version
boundary. Add a few words to stress that point.
Since the document was first written, we have added the CI job that
the document anticipated us to have. Rephrase to state the current
status.
The discussion in [*1*] made us abandon the "feature.git3" based
runtime switching of behaviour and instead adopt the compile-time
switching mechanism, but a stray sentence about runtime switching
still remained in the final text by mistake. Remove it.
[Reference]
*1* https://lore.kernel.org/git/xmqqldzel6ug.fsf@gitster.g/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's add a design doc about how we could improve handling liarge blobs
using "Large Object Promisors" (LOPs). It's a set of features with the
goal of using special dedicated promisor remotes to store large blobs,
and having them accessed directly by main remotes and clients.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
What happens to submodules during merge has been documented in a
bit more detail.
* lo/doc-merge-submodule-update:
merge-strategies.adoc: detail submodule merge
Assorted fixes and improvements to the build procedure based on
meson.
* ps/build-meson-fixes-0130:
gitlab-ci: restrict maximum number of link jobs on Windows
meson: consistently use custom program paths to resolve programs
meson: fix overwritten `git` variable
meson: prevent finding sed(1) in a loop
meson: improve handling of `sane_tool_path` option
meson: improve PATH handling
meson: drop separate version library
meson: stop linking libcurl into all executables
meson: introduce `libgit_curl` dependency
meson: simplify use of the common-main library
meson: inline the static 'git' library
meson: fix OpenSSL fallback when not explicitly required
meson: fix exec path with enabled runtime prefix
When our asciidoc files were renamed from "*.txt" to "*.adoc" in
1f010d6bdf (doc: use .adoc extension for AsciiDoc files, 2025-01-20)
the "meson.build" file in "Documentation" was updated but the
"meson.build" files in the "technical" and "howto" subdirectories were
not. This causes the meson build to fail when configured with
-Ddocs=html. Fix this by updating the relevant "meson.build" files.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We renamed from .txt to .adoc all the asciidoc source files and
necessary includes. We also need to adjust the build-docdep tool to
work on files whose suffix is .adoc when computing the documentation
dependencies.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .txt extensions were changed to .adoc in 1f010d6bdf (doc: use .adoc
extension for AsciiDoc files, 2025-01-20). This left broken links in
the generated howto-index.html.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Removal of ".git/branches" and ".git/remotes" support in the
BreakingChanges document has been further clarified.
* jc/3.0-branches-remotes-update:
BreakingChanges: clarify branches/ and remotes/
"git refs migrate" can optionally be told not to migrate the reflog.
* kn/ref-migrate-skip-reflog:
builtin/refs: add '--no-reflog' flag to drop reflogs