Commit Graph

80363 Commits

Author SHA1 Message Date
Johannes Sixt
c8c5df79df Merge branch 'jx/i18n-fix' of github.com:jiangxin/gitk
* 'jx/i18n-fix' of github.com:jiangxin/gitk:
  gitk: l10n: make PO headers identify the Gitk project
  gitk: ignore generated POT file
  gitk: i18n: use "Gitk" as package name in POT file

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2026-03-20 09:23:32 +01:00
Johannes Sixt
056fb617f6 Merge branch 'js/i18n-no-location'
* js/i18n-no-location:
  gitk: commit translation files without file information
2026-03-20 09:20:29 +01:00
Johannes Sixt
696e001e12 Merge branch 'sb/heed-ref-decoration-settings'
* sb/heed-ref-decoration-settings:
  gitk: use config settings for head/tag colors
2026-03-20 09:07:09 +01:00
Jeff King
598f40c4b3 contrib/diff-highlight: do not highlight identical pairs
We pair lines for highlighting based on their position in the hunk. So
we should never see two identical lines paired, like:

  -one
  -two
  +one
  +something else

which would pair -one/+one, because that implies that the diff could
easily be shrunk by turning line "one" into context.

But there is (at least) one exception: removing a newline at the end of
a file will produce a diff like:

  -foo
  +foo
  \No newline at end of file

And we will pair those two lines. As a result, we end up marking the
whole line, including the newline, as the shared prefix. And there's an
empty suffix.

The most obvious bug here is that when we try to print the highlighted
lines, we remove the trailing newline from the suffix, but do not bother
with the prefix (under the assumption that there had to be a difference
_somewhere_ in the line, and thus the prefix would not eat all the way
up to the newline). And so you get an extra line like:

  -foo

  +foo

  \No newline at end of file

This is obviously ugly, but also causes interactive.diffFilter to
(rightly) complain that the input and output do not match their lines
1-to-1.

This could easily be fixed by chomping the prefix, too, but I think the
problem is deeper. For one, I suspect some of the other logic gets
confused by forming an array with zero-indexed element "3" in a
3-element array. But more importantly, we try not to highlight whole
lines, as there's nothing interesting to show there. So let's catch this
early in is_pair_interesting() and bail to our usual passthrough
strategy.

Reported-by: Scott Baker <scott@perturb.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 22:29:14 -07:00
Jiang Xin
e0f3d08b9b gitk: l10n: make PO headers identify the Gitk project
Commit f697d08 (gitk: i18n: use "Gitk" as package name in POT file,
2026-03-19) updated the generated POT template to use "Gitk" in its
Project-Id-Version header. Several existing PO files still carry older
header values such as "git" or "git-gui", so they do not consistently
identify themselves as Gitk translations.

Update the Project-Id-Version field in all Gitk PO files so that they
identify the Gitk project consistently.

The "Project-Id-Version" field in the PO header helps tools identify
which project a PO file belongs to. For example, Git's
"git-po-helper" uses it to choose project-specific checks and POT
handling rules. Without this change, some Gitk PO files are
misidentified because their headers still refer to other projects.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2026-03-20 09:42:03 +08:00
Jiang Xin
cc10508b18 gitk: ignore generated POT file
"po/gitk.pot" is generated from the source for translation maintenance.
Ignore it in the working tree so regenerating the template does not
introduce unnecessary noise in `git status`.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2026-03-20 09:31:04 +08:00
Jiang Xin
f697d08df7 gitk: i18n: use "Gitk" as package name in POT file
Use "Gitk" instead of the placeholder "PACKAGE" in the header of the
generated po/gitk.pot file. In particular, the "Project-Id-Version"
field in the header entry should be set to:

    "Project-Id-Version: Gitk\n"

New PO files generated from this POT file will inherit that package
name.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2026-03-20 09:31:04 +08:00
Junio C Hamano
7ff1e8dc1e The 18th batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 09:54:57 -07:00
Junio C Hamano
39267c8a7e Merge branch 'ss/submodule--helper-use-xmalloc'
Code clean-up.

* ss/submodule--helper-use-xmalloc:
  submodule--helper: replace malloc with xmalloc
2026-03-19 09:54:57 -07:00
Junio C Hamano
80595ab08e Merge branch 'ps/unit-test-c-escape-names.txt'
The unit test helper function was taught to use backslash +
mnemonic notation for certain control characters like "\t", instead
of octal notation like "\011".

* ps/unit-test-c-escape-names.txt:
  test-lib: print escape sequence names
2026-03-19 09:54:56 -07:00
Junio C Hamano
5a0ee6f793 Merge branch 'jc/doc-wholesale-replace-before-next'
Doc update.

* jc/doc-wholesale-replace-before-next:
  SubmittingPatches: spell out "replace fully to pretend to be perfect"
2026-03-19 09:54:56 -07:00
Junio C Hamano
accd0e107b Merge branch 'lc/rebase-trailer'
"git rebase" learns "--trailer" command to drive the
interpret-trailers machinery.

* lc/rebase-trailer:
  rebase: support --trailer
  commit, tag: parse --trailer with OPT_STRVEC
  trailer: append trailers without fork/exec
  trailer: libify a couple of functions
  interpret-trailers: refactor create_in_place_tempfile()
  interpret-trailers: factor trailer rewriting
2026-03-19 09:54:56 -07:00
Junio C Hamano
a7a079c2c4 Merge branch 'bk/run-command-wo-the-repository'
The run_command() API lost its implicit dependencyon the singleton
`the_repository` instance.

* bk/run-command-wo-the-repository:
  run-command: wean auto_maintenance() functions off the_repository
  run-command: wean start_command() off the_repository
2026-03-19 09:54:56 -07:00
Junio C Hamano
2ca397fa7c Merge branch 'ps/editorconfig-unanchor'
Editorconfig filename patterns were specified incorrectly, making
many source files inside subdirectories unaffected, which has been
corrected.

* ps/editorconfig-unanchor:
  editorconfig: fix style not applying to subdirs anymore
2026-03-19 09:54:55 -07:00
Junio C Hamano
432282f9cb Merge branch 'ss/t3200-test-zero-oid'
A test now uses the symbolic constant $ZERO_OID instead of 40 "0" to
work better with SHA-256 as well as SHA-1.

* ss/t3200-test-zero-oid:
  t3200: replace hardcoded null OID with $ZERO_OID
2026-03-19 09:54:55 -07:00
Junio C Hamano
17b9c759bd Merge branch 'dd/list-objects-filter-options-wo-strbuf-split'
The way combined list-object filter options are parsed has been
revamped.

* dd/list-objects-filter-options-wo-strbuf-split:
  list-objects-filter-options: avoid strbuf_split_str()
  worktree: do not pass strbuf by value
2026-03-19 09:54:55 -07:00
Junio C Hamano
2d576325c3 Merge branch 'ps/t9200-test-path-is-helpers'
Test update.

* ps/t9200-test-path-is-helpers:
  t9200: replace test -f with modern path helper
  t9200: handle missing CVS with skip_all
2026-03-19 09:54:55 -07:00
Patrick Steinhardt
671df48df8 meson: precompile "git-compat-util.h"
Every compilation unit in Git is expected to include "git-compat-util.h"
first, either directly or indirectly via "builtin.h". This header papers
over differences between platforms so that we can expect the typical
POSIX functions to exist. Furthermore, it provides functionality that we
end up using everywhere.

This header is thus quite heavy as a consequence. Preprocessing it as a
standalone unit via `clang -E git-compat-util.h` yields over 23,000
lines of code overall. Naturally, it takes quite some time to compile
all of this.

Luckily, this is exactly the kind of use case that precompiled headers
aim to solve: instead of recompiling it every single time, we compile it
once and then link the result into the executable. If include guards are
set up properly it means that the file won't need to be reprocessed.

Set up such a precompiled header for "git-compat-util.h" and wire it up
via Meson. This causes Meson to implicitly include the precompiled
header in all compilation units. With GCC and Clang for example this is
done via the "-include" statement [1].

This leads to a significant speedup when performing full builds:

  Benchmark 1: ninja (rev = HEAD~)
  Time (mean ± σ):     14.467 s ±  0.126 s    [User: 248.133 s, System: 31.298 s]
  Range (min … max):   14.195 s … 14.633 s    10 runs

  Benchmark 2: ninja (rev = HEAD)
    Time (mean ± σ):     10.307 s ±  0.111 s    [User: 173.290 s, System: 23.998 s]
    Range (min … max):   10.030 s … 10.433 s    10 runs

  Summary
    ninja (rev = HEAD) ran
      1.40 ± 0.02 times faster than ninja (rev = HEAD~)

[1]: https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:09 -07:00
Patrick Steinhardt
daa56fb789 meson: compile compatibility sources separately
In the next commit we're about to introduce a precompiled header for
"git-compat-util.h". The consequence of this change is that we'll
implicitly include that header for every compilation unit that uses the
precompiled headers.

This is okay for our "normal" library sources and our builtins. But some
of our compatibility sources do not include the header on purpose, and
doing so would cause compilation errors.

Prepare for this change by splitting out compatibility sources into
their static library. Like this, we can selectively enable precompiled
headers for the library sources.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:09 -07:00
Patrick Steinhardt
baa61e46da git-compat-util.h: move warning infra to prepare for PCHs
The "git-compat-util.h" header is supposed to be the first header
included by every code compilation unit. As such, a subsequent commit
will start to precompile this header to speed up compilation of Git.

This will cause an issue though with the way that we have set up the
"-Wsign-compare" warnings. It is expected that any compilation unit that
fails with that compiler warning sets `DISABLE_SIGN_COMPARE_WARNINGS`
before including "git-compat-util.h". If so, we'll disable the warning
right away via a compiler pragma.

But with precompiled headers we do not know ahead of time whether the
code unit wants to disable those warnings, and thus we'll have to
precompile the header without defining `DISABLE_SIGN_COMPARE_WARNINGS`.
But as the pragma statement is wrapped by our include guards, the second
include of that file will not have the desired effect of disabling the
warnings anymore.

We could fix this issue by declaring a new macro that compilation units
are expected to invoke after having included the file. In retrospect,
that would have been the better way to handle this as it allows for
more flexibility: we could for example toggle the warning for specific
code blocks, only. But changing this now would require a bunch of
changes, and the churn feels excessive for what we gain.

Instead, prepare for the precompiled headers by moving the code outside
of the include guards.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:09 -07:00
Patrick Steinhardt
a767f2fd6c builds: move build scripts into "tools/"
We have a bunch of scripts used by our different build systems that are
all located in the top-level directory. Now that we have introduced the
new "tools/" directory though we have a better home for them.

Move the scripts into the "tools/" directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:09 -07:00
Patrick Steinhardt
405c98a6a0 contrib: move "update-unicode.sh" script into "tools/"
The "update-unicode.sh" script is used to update the unicode data
compiled into Git whenever a new version of the Unicode standard has
been released. As such, it is a natural part of our developer-facing
tooling, and its presence in "contrib/" is misleading.

Promote the script into the new "tools/" directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:09 -07:00
Patrick Steinhardt
fe309664ea contrib: move "coverage-diff.sh" script into "tools/"
The "coverage-diff.sh" script can be used to get information about test
coverage fro the Git codebase. It is thus rather specific to our build
and test infrastructure and part of the developer-facing tooling. The
fact that this script is part of "contrib/" is thus rather misleading
and a historic wart.

Promote the tool into the new "tools/" directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:08 -07:00
Patrick Steinhardt
8ca1b4472c contrib: move "coccinelle/" directory into "tools/"
The Coccinelle tool is an ingrained part of our build infrastructure. It
is executed by our CI to detect antipatterns and is used to detect
misuses of certain interfaces. It's presence in "contrib/" is thus
rather misleading.

Promote the configuration into the new "tools/" directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:08 -07:00
Patrick Steinhardt
8872941fd2 Introduce new "tools/" directory
According to its readme, the "contrib/" directory's main intent is to
collect stuff that is not an official part of Git, either because it is
too specialized or because it is still considered experimental. The
reality tells a bit of a different story though: while it _does_ contain
such things, it also contains other things:

  - Our credential helpers, which are being distributed by many
    packagers nowadays and which can be considered "stable".

  - A bunch of tooling that relates to our build and test
    infrastructure.

Especially the second category is somewhat of a sore spot. You really
wouldn't expect build-related tooling to be considered an optional part
of Git. Quite the opposite.

Create a new top-level "tools/" directory to fix this discrepancy. This
directory will contain all kind of tools that are related to our build
infrastructure and that Git developers are likely to use day to day.

For now, this directory doesn't contain anything yet except for a
readme and a Meson skeleton. This will change in subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-19 06:40:08 -07:00
Jeff King
2594747ad1 transport: plug leaks in transport_color_config()
We retrieve config values with repo_config_get_string(), which will
allocate a new copy of the string for us. But we don't hold on to those
strings, since they are just fed to git_config_colorbool() and
color_parse(). But nor do we free them, which means they leak.

We can fix this by using the "_tmp" form of repo_config_get_string(),
which just hands us a pointer directly to the internal storage. This is
OK for our purposes, since we don't need it to last for longer than our
parsing calls.

Two interesting side notes here:

  1. Many types already have a repo_config_get_X() variant that handles
     this for us (e.g., repo_config_get_bool()). But neither colorbools
     nor colors themselves have such helpers. We might think about
     adding them, but converting all callers is a larger task, and out
     of scope for this fix.

  2. As far as I can tell, this leak has been there since 960786e761
     (push: colorize errors, 2018-04-21), but wasn't detected by LSan in
     our test suite. It started triggering when we applied dd3693eb08
     (transport-helper, connect: use clean_on_exit to reap children on
     abnormal exit, 2026-03-12) which is mostly unrelated.

     Even weirder, it seems to trigger only with clang (and not gcc),
     and only with GIT_TEST_DEFAULT_REF_FORMAT=reftable. So I think this
     is another odd case where the pointers happened to be hanging
     around in stack memory, but changing the pattern of function calls
     in nearby code was enough for them to be incidentally overwritten.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16 13:31:48 -07:00
PRASHANT S BISHT
d893f3e7cc t4200: convert test -[df] checks to test_path_* helpers
Replace old-style path existence checks in t4200-rerere.sh with
the appropriate test_path_* helper functions. These helpers provide
clearer diagnostic messages on failure than the raw shell test
builtin.

Signed-off-by: Prashant S Bisht <prashantjee2025@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16 13:10:16 -07:00
Mirko Faina
48430e44ac t0008: improve test cleanup to fix failing test
The "large exclude file ignored in tree" test fails. This is due to an
additional warning message that is generated in the test. "warning:
unable to access 'subdir/.gitignore': Too many levels of symbolic
links", the extra warning that is not supposed to be there, happens
because of some leftover files left by previous tests.

To fix this we improve cleanup on "symlinks not respected in-tree", and
because the tests in t0008 in general have poor cleanup, at the start of
"large exclude file ignored in tree" we search for any leftover
.gitignore and remove them before starting the test.

Improve post-test cleanup and add pre-test cleanup to make sure that we
have a workable environment for the test.

Signed-off-by: Mirko Faina <mroik@delayed.space>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16 12:43:37 -07:00
Junio C Hamano
ca1db8a0f7 The 17th batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16 10:48:15 -07:00
Junio C Hamano
9883fcb960 Merge branch 'ty/patch-ids-document-lazy-eval'
In-code comment update to record a design decision to allow lazy
computation of patch IDs.

* ty/patch-ids-document-lazy-eval:
  patch-ids: document intentional const-casting in patch_id_neq()
2026-03-16 10:48:15 -07:00
Junio C Hamano
3c59f3bc4b Merge branch 'rs/history-ergonomics-updates-fix'
Fix use of uninitialized variable.

* rs/history-ergonomics-updates-fix:
  history: initialize rev_info in cmd_history_reword()
2026-03-16 10:48:15 -07:00
Junio C Hamano
2eec0f5115 Merge branch 'jk/unleak-mmap'
Plug a few leaks where mmap'ed memory regions are not unmapped.

* jk/unleak-mmap:
  meson: turn on NO_MMAP when building with LSan
  Makefile: turn on NO_MMAP when building with LSan
  object-file: fix mmap() leak in odb_source_loose_read_object_stream()
  pack-revindex: avoid double-loading .rev files
  check_connected(): fix leak of pack-index mmap
  check_connected(): delay opening new_pack
2026-03-16 10:48:15 -07:00
Junio C Hamano
c563b12ce7 Merge branch 'ty/setup-error-tightening'
While discovering a ".git" directory, the code treats any stat()
failure as a sign that a filesystem entity .git does not exist
there, and ignores ".git" that is not a "gitdir" file or a
directory.  The code has been tightened to notice and report
filesystem corruption better.

* ty/setup-error-tightening:
  setup: improve error diagnosis for invalid .git files
2026-03-16 10:48:14 -07:00
Junio C Hamano
d4b2a7908d Merge branch 'os/doc-git-custom-commands'
Doc update.

* os/doc-git-custom-commands:
  doc: make it easier to find custom command information
2026-03-16 10:48:14 -07:00
Junio C Hamano
a11f0ad992 Merge branch 'fp/t3310-unhide-git-failures'
The construct 'test "$(command)" = expectation' loses the exit
status from the command, which has been fixed by breaking up the
statement into pieces.

* fp/t3310-unhide-git-failures:
  t3310: avoid hiding failures from rev-parse in command substitutions
2026-03-16 10:48:14 -07:00
Junio C Hamano
61a45befd3 Merge branch 'jt/repo-structure-extrema'
"git repo structure" command learns to report maximum values on
various aspects of objects it inspects.

* jt/repo-structure-extrema:
  builtin/repo: find tree with most entries
  builtin/repo: find commit with most parents
  builtin/repo: add OID annotations to table output
  builtin/repo: collect largest inflated objects
  builtin/repo: add helper for printing keyvalue output
  builtin/repo: update stats for each object
2026-03-16 10:48:14 -07:00
Junio C Hamano
b8ac2bf3f0 Merge branch 'sp/wt-status-wo-the-repository'
Reduce dependence on the global the_hash_algo and the_repository
variables of wt-status code path.

* sp/wt-status-wo-the-repository:
  wt-status: use hash_algo from local repository instead of global the_hash_algo
  wt-status: replace uses of the_repository with local repository instances
  wt-status: pass struct repository through function parameters
2026-03-16 10:48:14 -07:00
Guillaume Jacob
5514f14617 doc: fix git grep args order in Quick Reference
The example provided has its arguments in the wrong order. The revision
should follow the pattern, and not the other way around.

Signed-off-by: Guillaume Jacob <guillaume@absolut-sensing.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16 10:15:34 -07:00
Patrick Steinhardt
3cf4024117 clar: update to fix compilation on platforms without PATH_MAX
Update clar to e4172e3 (Merge pull request #134 from
clar-test/ethomson/const, 2026-01-10). Besides some changes to
"generate.py" which don't have any impact on us, this commit also fixes
compilation on platforms that don't have PATH_MAX, like for example
GNU/Hurd.

Reported-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16 10:00:41 -07:00
Deveshi Dwivedi
8f8e1b0807 stash: do not pass strbuf by value
save_untracked_files() takes its 'files' parameter as struct strbuf
by value.  Passing a strbuf by value copies the struct but shares
the underlying buffer between caller and callee, risking a dangling
pointer and double-free if the callee reallocates.

The function needs both the buffer and its length for
pipe_command(), so a plain const char * is not sufficient here.
Switch the parameter to struct strbuf * and update the caller to
pass a pointer.

Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-15 14:46:51 -07:00
Deveshi Dwivedi
65fec23b57 coccinelle: detect struct strbuf passed by value
Passing a struct strbuf by value to a function copies the struct
but shares the underlying character array between caller and callee.
If the callee causes a reallocation, the caller's copy becomes a
dangling pointer, leading to a double-free when strbuf_release() is
called.  There is no coccinelle rule to catch this pattern.

Jeff King suggested adding one during review of the
write_worktree_linking_files() fix [1], and noted that a reporting
rule using coccinelle's Python scripting extensions could emit a
descriptive warning, but we do not currently require Python support
in coccinelle.

Add a transformation rule that rewrites a by-value strbuf parameter
to a pointer.  The detection is identical to what a Python-based
reporting rule would catch; only the presentation differs.  The
resulting diff will not produce compilable code on its own (callers
and the function body still need updating), but the spatch output
alerts the developer that the signature needs attention.  This is
consistent with the other rules in strbuf.cocci, which also rewrite
to the preferred form.

[1] https://lore.kernel.org/git/20260309192600.GC309867@coredump.intra.peff.net/

Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-15 14:46:51 -07:00
Junio C Hamano
6e84af9ff4 Merge branch 'dd/list-objects-filter-options-wo-strbuf-split' into dd/cocci-do-not-pass-strbuf-by-value
* dd/list-objects-filter-options-wo-strbuf-split:
  list-objects-filter-options: avoid strbuf_split_str()
  worktree: do not pass strbuf by value
2026-03-15 14:46:30 -07:00
Ritesh Singh Jadoun
2f05039717 t/pack-refs-tests: use test_path_is_missing
The pack-refs tests previously used raw 'test -f' and 'test -e' checks
with negation. Update them to use Git's standard helper function
test_path_is_missing for consistency and clearer failure reporting.

As suggested in review, replaced the negated 'test_path_exists' with
test_path_is_missing to better reflect the expected absence of paths.

Signed-off-by: Ritesh Singh Jadoun <riteshjd75@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-15 14:21:35 -07:00
Patrick Steinhardt
835e0aaf6f builtin/pack-objects: reduce lock contention when writing packfile data
When running `git pack-objects --stdout` we feed the data through
`hashfd_ext()` with a progress meter and a smaller-than-usual buffer
length of 8kB so that we can track throughput more granularly. But as
packfiles tend to be on the larger side, this small buffer size may
cause a ton of write(3p) syscalls.

Originally, the buffer we used in `hashfd()` was 8kB for all use cases.
This was changed though in 2ca245f8be (csum-file.h: increase hashfile
buffer size, 2021-05-18) because we noticed that the number of writes
can have an impact on performance. So the buffer size was increased to
128kB, which improved performance a bit for some use cases.

But the commit didn't touch the buffer size for `hashd_throughput()`.
The reasoning here was that callers expect the progress indicator to
update frequently, and a larger buffer size would of course reduce the
update frequency especially on slow networks.

While that is of course true, there was (and still is, even though it's
now a call to `hashfd_ext()`) only a single caller of this function in
git-pack-objects(1). This command is responsible for writing packfiles,
and those packfiles are often on the bigger side. So arguably:

  - The user won't care about increments of 8kB when packfiles tend to
    be megabytes or even gigabytes in size.

  - Reducing the number of syscalls would be even more valuable here
    than it would be for multi-pack indices, which was the benchmark
    done in the mentioned commit, as MIDXs are typically significantly
    smaller than packfiles.

  - Nowadays, many internet connections should be able to transfer data
    at a rate significantly higher than 8kB per second.

Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`,
which translates to ~64kB. This limit was chosen because `git
pack-objects --stdout` is most often used when sending packfiles via
git-upload-pack(1), where packfile data is chunked into pktlines when
using the sideband. Furthermore, most internet connections should have a
bandwidth signifcantly higher than 64kB/s, so we'd still be able to
observe progress updates at a rate of at least once per second.

This change significantly reduces the number of write(3p) syscalls from
355,000 to 44,000 when packing the Linux repository. While this results
in a small performance improvement on an otherwise-unused system, this
improvement is mostly negligible. More importantly though, it will
reduce lock contention in the kernel on an extremely busy system where
we have many processes writing data at once.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13 08:54:15 -07:00
Patrick Steinhardt
2bf8f36ddb csum-file: drop hashfd_throughput()
The `hashfd_throughput()` function is used by a single callsite in
git-pack-objects(1). In contrast to `hashfd()`, this function uses a
progress meter to measure throughput and a smaller buffer length so that
the progress meter can provide more granular metrics.

We're going to change that caller in the next commit to be a bit more
specific to packing objects. As such, `hashfd_throughput()` will be a
somewhat unfitting mechanism for any potential new callers.

Drop the function and replace it with a call to `hashfd_ext()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13 08:54:15 -07:00
Patrick Steinhardt
a1118c0a44 csum-file: introduce hashfd_ext()
Introduce a new `hashfd_ext()` function that takes an options structure.
This function will replace `hashd_throughput()` in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13 08:54:15 -07:00
Patrick Steinhardt
26986f4cba sideband: use writev(3p) to send pktlines
Every pktline that we send out via `send_sideband()` currently requires
two syscalls: one to write the pktline's length, and one to send its
data. This typically isn't all that much of a problem, but under extreme
load the syscalls may cause contention in the kernel.

Refactor the code to instead use the newly introduced writev(3p) infra
so that we can send out the data with a single syscall. This reduces the
number of syscalls from around 133,000 calls to write(3p) to around
67,000 calls to writev(3p).

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13 08:54:14 -07:00
Patrick Steinhardt
1970fcef93 wrapper: introduce writev(3p) wrappers
In the preceding commit we have added a compatibility wrapper for the
writev(3p) syscall. Introduce some generic wrappers for this function
that we nowadays take for granted in the Git codebase.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13 08:54:14 -07:00
Patrick Steinhardt
3b9b2c2a29 compat/posix: introduce writev(3p) wrapper
In a subsequent commit we're going to add the first caller to
writev(3p). Introduce a compatibility wrapper for this syscall that we
can use on systems that don't have this syscall.

The syscall exists on modern Unixes like Linux and macOS, and seemingly
even for NonStop according to [1]. It doesn't seem to exist on Windows
though.

[1]: http://nonstoptools.com/manuals/OSS-SystemCalls.pdf
[2]: https://www.gnu.org/software/gnulib/manual/html_node/writev.html

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13 08:54:14 -07:00
Patrick Steinhardt
a5816e4596 upload-pack: reduce lock contention when writing packfile data
In our production systems we have recently observed write contention in
git-upload-pack(1). The system in question was consistently streaming
packfiles at a rate of dozens of gigabits per second, but curiously the
system was neither bottlenecked on CPU, memory or IOPS.

We eventually discovered that Git was spending 80% of its time in
`pipe_write()`, out of which almost all of the time was spent in the
`ep_poll_callback` function in the kernel. Quoting the reporter:

  This infrastructure is part of an event notification queue designed to
  allow for multiple producers to emit events, but that concurrency
  safety is guarded by 3 layers of locking. The layer we're hitting
  contention in uses a simple reader/writer lock mode (a.k.a. shared
  versus exclusive mode), where producers need shared-mode (read mode),
  and various other actions use exclusive (write) mode.

The system in question generates workloads where we have hundreds of
git-upload-pack(1) processes active at the same point in time. These
processes end up contending around those locks, and the consequence is
that the Git processes stall.

Now git-upload-pack(1) already has the infrastructure in place to buffer
some of the data it reads from git-pack-objects(1) before actually
sending it out. We only use this infrastructure in very limited ways
though, so we generally end up matching one read(3p) call with one
write(3p) call. Even worse, when the sideband is enabled we end up
matching one read with _two_ writes: one for the pkt-line length, and
one for the packfile data.

Extend our use of the buffering infrastructure so that we soak up bytes
until the buffer is filled up at least 2/3rds of its capacity. The
change is relatively simple to implement as we already know to flush the
buffer in `create_pack_file()` after git-pack-objects(1) has finished.

This significantly reduces the number of write(3p) syscalls we need to
do. Before this change, cloning the Linux repository resulted in around
400,000 write(3p) syscalls. With the buffering in place we only do
around 130,000 syscalls.

Now we could of course go even further and make sure that we always fill
up the whole buffer. But this might cause an increase in read(3p)
syscalls, and some tests show that this only reduces the number of
write(3p) syscalls from 130,000 to 100,000. So overall this doesn't seem
worth it.

Note that the issue could also be fixed by adapting the write buffer
that we use in the downstream git-pack-objects(1) command, and such a
change would have roughly the same result. But the command that
generates the packfile data may not always be git-pack-objects(1) as it
can be changed via "uploadpack.packObjectsHook", so such a fix would
only help in _some_ cases. Regardless of that, we'll also adapt the
write buffer size of git-pack-objects(1) in a subsequent commit.

Helped-by: Matt Smiley <msmiley@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13 08:54:14 -07:00