Commit Graph

161709 Commits

Author SHA1 Message Date
Derrick Stolee
717d2ea2ca path-walk: introduce an object walk by path
In anticipation of a few planned applications, introduce the most basic form
of a path-walk API. It currently assumes that there are no UNINTERESTING
objects, and does not include any complicated filters. It calls a function
pointer on groups of tree and blob objects as grouped by path. This only
includes objects the first time they are discovered, so an object that
appears at multiple paths will not be included in two batches.

There are many future adaptations that could be made, but they are left for
future updates when consumers are ready to take advantage of those features.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-12-30 22:12:10 +01:00
Johannes Schindelin
d3c8e10135 repack/pack-objects: mark --full-name-hash as experimental
This option is still under discussion on the Git mailing list.

We still would like to have some real-world data, and the best way to
get it is to get a Git for Windows release into users' hands so that
they can test it.

Nevertheless, without the official blessing of the Git maintainer, this
optionis experimental, and we need to be clear about that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-12-30 22:12:10 +01:00
Derrick Stolee
c18eec2138 test-tool: add helper for name-hash values
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.

Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values.

Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.

My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                        4.5K
5314.2: number of distinct name-hashes                       4.1K
5314.3: number of distinct full-name-hashes                  4.5K
5314.4: maximum multiplicity of name-hashes                    13
5314.5: maximum multiplicity of fullname-hashes                 1

Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.

In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                       19.6K
5314.2: number of distinct name-hashes                       8.2K
5314.3: number of distinct full-name-hashes                 19.6K
5314.4: maximum multiplicity of name-hashes                   279
5314.5: maximum multiplicity of fullname-hashes                 1

[1] https://github.com/microsoft/fluentui

That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.

In this repository, no collisions occur for the full-name-hash
algorithm.

In a more extreme example, an internal monorepo had a much worse
collision rate:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                      221.6K
5314.2: number of distinct name-hashes                      72.0K
5314.3: number of distinct full-name-hashes                221.6K
5314.4: maximum multiplicity of name-hashes                 14.4K
5314.5: maximum multiplicity of fullname-hashes                 2

Even in this repository with many more paths at HEAD, the collision rate
was low and the maximum number of paths being grouped into a single
bucket by the full-path-name algorithm was two.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-12-30 22:12:09 +01:00
Derrick Stolee
e06da52130 p5313: add size comparison test
As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.

The recently-added --full-name-hash option swaps the default name-hash
algorithm with one that attempts to uniformly distribute the hashes
based on the full path name instead of the last 16 characters.

This has a dramatic effect on full repacks for repositories with many
versions of most paths. It can have a negative impact on cases such as
pushing a single change.

This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.

[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875

Checked out at the parent of [2], I see the following statistics:

Test                                           this tree
------------------------------------------------------------------
5313.2: thin pack                              0.02(0.01+0.01)
5313.3: thin pack size                                    1.1K
5313.4: thin pack with --full-name-hash        0.02(0.01+0.00)
5313.5: thin pack size with --full-name-hash              3.0K
5313.6: big pack                               1.65(3.35+0.24)
5313.7: big pack size                                    58.0M
5313.8: big pack with --full-name-hash         1.53(2.52+0.18)
5313.9: big pack size with --full-name-hash              57.6M
5313.10: repack                                176.52(706.60+3.53)
5313.11: repack size                                    446.7K
5313.12: repack with --full-name-hash          37.47(134.18+3.06)
5313.13: repack size with --full-name-hash              183.1K

Note that this demonstrates a 3x size _increase_ in the case that
simulates a small "git push". The size change is neutral on the case of
pushing the difference between HEAD and HEAD~1000.

However, the full repack case is both faster and more efficient.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-12-30 19:51:25 +01:00
Derrick Stolee
4875b991dd git-repack: update usage to match docs
This also adds the '--full-name-hash' option introduced in the previous
change and adds newlines to the synopsis.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-12-30 19:51:24 +01:00
Derrick Stolee
c85f4fd00b pack-objects: add GIT_TEST_FULL_NAME_HASH
Add a new environment variable to opt-in to the --full-name-hash option
in 'git pack-objects'. This allows for extra testing of the feature
without repeating all of the test scenarios.

But this option isn't free. There are a few tests that change behavior
with the variable enabled.

First, there are a few tests that are very sensitive to certain delta
bases being picked. These are both involving the generation of thin
bundles and then counting their objects via 'git index-pack --fix-thin'
which pulls the delta base into the new packfile. For these tests,
disable the option as a decent long-term option.

Second, there are two tests in t5616-partial-clone.sh that I believe are
actually broken scenarios. While the client is set up to clone the
'promisor-server' repo via a treeless partial clone filter (tree:0),
that filter does not translate to the 'server' repo. Thus, fetching from
these repos causes the server to think that the client has all reachable
trees and blobs from the commits advertised as 'haves'. This leads the
server to providing a thin pack assuming those objects as delta bases.
Changing the name-hash algorithm presents new delta bases and thus
breaks the expectations of these tests. An alternative could be to set
up 'server' as a promisor server with the correct filter enabled. This
may also point out more issues with partial clone being set up as a
remote-based filtering mechanism and not a repository-wide setting. For
now, do the minimal change to make the test work by disabling the test
variable.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-12-30 19:51:24 +01:00
Derrick Stolee
b0f1e442fb repack: test --full-name-hash option
The new '--full-name-hash' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.

The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--full-name-hash option is passing through.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-12-30 19:51:24 +01:00
Derrick Stolee
234f80e5d4 pack-objects: add --full-name-hash option
The pack_name_hash() method has not been materially changed since it was
introduced in ce0bd64299 (pack-objects: improve path grouping
heuristics., 2006-06-05). The intention here is to group objects by path
name, but also attempt to group similar file types together by making
the most-significant digits of the hash be focused on the final
characters.

Here's the crux of the implementation:

	/*
	 * This effectively just creates a sortable number from the
	 * last sixteen non-whitespace characters. Last characters
	 * count "most", so things that end in ".c" sort together.
	 */
	while ((c = *name++) != 0) {
		if (isspace(c))
			continue;
		hash = (hash >> 2) + (c << 24);
	}

As the comment mentions, this only cares about the last sixteen
non-whitespace characters. This cause some filenames to collide more
than others. Here are some examples that I've seen while investigating
repositories that are growing more than they should be:

 * "/CHANGELOG.json" is 15 characters, and is created by the beachball
   [1] tool. Only the final character of the parent directory can
   differntiate different versions of this file, but also only the two
   most-significant digits. If that character is a letter, then this is
   always a collision. Similar issues occur with the similar
   "/CHANGELOG.md" path, though there is more opportunity for
   differences in the parent directory.

 * Localization files frequently have common filenames but differentiate
   via parent directories. In C#, the name "/strings.resx.lcl" is used
   for these localization files and they will all collide in name-hash.

[1] https://github.com/microsoft/beachball

I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.

It is clear that the existing name-hash algorithm is optimized for
repositories with short path names, but also is optimized for packing a
single snapshot of a repository, not a repository with many versions of
the same file. In my testing, this has proven out where the name-hash
algorithm does a good job of finding peer files as delta bases when
unable to use a historical version of that exact file.

However, for repositories that have many versions of most files and
directories, it is more important that the objects that appear at the
same path are grouped together.

Create a new pack_full_name_hash() method and a new --full-name-hash
option for 'git pack-objects' to call that method instead. Add a simple
pass-through for 'git repack --full-name-hash' for additional testing in
the context of a full repack, where I expect this will be most
effective.

The hash algorithm is as simple as possible to be reasonably effective:
for each character of the path string, add a multiple of that character
and a large prime number (chosen arbitrarily, but intended to be large
relative to the size of a uint32_t). Then, shift the current hash value
to the right by 5, with overlap. The addition and shift parameters are
standard mechanisms for creating hard-to-predict behaviors in the bits
of the resulting hash.

This is not meant to be cryptographic at all, but uniformly distributed
across the possible hash values. This creates a hash that appears
pseudorandom. There is no ability to consider similar file types as
being close to each other.

In a later change, a test-tool will be added so the effectiveness of
this hash can be demonstrated directly.

For now, let's consider how effective this mechanism is when repacking a
repository with and without the --full-name-hash option. Specifically,
let's use 'git repack -adf [--full-name-hash]' as our test.

On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:

| Stage                 | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone           | 260 MB    | N/A         |
| Standard Repack       | 127MB     | 106s        |
| With --full-name-hash | 126 MB    | 99s         |

This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics with and without --full-name-hash.

However, we can test this in a repository that uses one of the
problematic naming conventions above. The fluentui [2] repo uses
beachball to generate CHANGELOG.json and CHANGELOG.md files, and these
files have very poor delta characteristics when comparing against
versions across parent directories.

| Stage                 | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone           | 694 MB    | N/A         |
| Standard Repack       | 438 MB    | 728s        |
| With --full-name-hash | 168 MB    | 142s        |

[2] https://github.com/microsoft/fluentui

In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.

Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:

| Repo     | Standard Repack | With --full-name-hash |
|----------|-----------------|-----------------------|
| fluentui |         438 MB  |               168 MB  |
| Repo B   |       6,255 MB  |               829 MB  |
| Repo C   |      37,737 MB  |             7,125 MB  |
| Repo D   |     130,049 MB  |             6,190 MB  |

Future changes could include making --full-name-hash implied by a config
value or even implied by default during a full repack.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-12-30 19:51:24 +01:00
Johannes Schindelin
cef418de69 ci: work around a problem with HTTP/2 vs libcurl v8.10.0
As reported in https://lore.kernel.org/git/ZuPKvYP9ZZ2mhb4m@pks.im/,
libcurl v8.10.0 had a regression that was picked up by Git's t5559.30
"large fetch-pack requests can be sent using chunked encoding".

This bug was fixed in libcurl v8.10.1.

Sadly, the macos-13 runner image was updated in the brief window between
these two libcurl versions, breaking each and every CI build, as
reported at https://github.com/git-for-windows/git/issues/5159.

This would usually not matter, we would just ignore the failing CI
builds until the macos-13 runner image is rebuilt in a couple of days,
and then the CI builds would succeed again.

However.

As has become the custom, a surprise Git version was released, and now
that Git for Windows wants to follow suit, since Git for Windows has
this custom of trying to never release a version with a failing CI
build, we _must_ work around it.

This patch implements this work-around, basically for the sake of Git
for Windows v2.46.2's CI build.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2024-12-30 19:51:24 +01:00
Johannes Schindelin
38993b5620 Start the merging-rebase to v2.48.0-rc1
This commit starts the rebase of e7b5484632 to b3f05734c64
2024-12-30 19:42:10 +01:00
Junio C Hamano
bc2c65770d Git 2.48-rc1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-30 06:58:28 -08:00
Junio C Hamano
e1d34f36ea Merge branch 'ms/t7611-test-path-is-file'
Test modernization.

* ms/t7611-test-path-is-file:
  t7611: replace test -f with test_path_is* helpers
2024-12-30 06:56:28 -08:00
Junio C Hamano
306ab352f4 Merge branch 'ps/meson-test-wo-gitweb'
meson-based build without GitWeb failed the self tests.

* ps/meson-test-wo-gitweb:
  meson: enable auto-discovered "gitweb"
  GIT-BUILD-OPTIONS: wire up NO_GITWEB option
  GIT-BUILD-OPTIONS: sort variables alphabetically
2024-12-28 12:20:35 -08:00
Junio C Hamano
df2faf1a65 Merge branch 'as/gitk-git-gui-repo-update'
The developer documentation has been updated to give the latest
info on gitk and git-gui maintainer.

* as/gitk-git-gui-repo-update:
  Update the official repo of gitk
2024-12-28 10:11:42 -08:00
Patrick Steinhardt
d963ac98ec meson: enable auto-discovered "gitweb"
In 7d549fe317 (meson: skip gitweb build when Perl is disabled,
2024-12-20) we have started to conditionally enable "gitweb" based on
whether or not Perl is enabled. By accident though that change causes us
to not build gitweb in case its feature flag is set to "auto" even if
autoconfiguration determines that it could be built. This is because we
use "gitweb_option.enabled()", which only checks whether the feature has
been explicitly enabled.

Fix the issue by using `gitweb_option.allowed()` instead, which returns
true in case it is either explicitly enabled or set to "auto". This also
works for the case where the feature becomes auto-disabled due to Perl
not being present because we use `disable_auto_if(not perl.found())`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:17:19 -08:00
Patrick Steinhardt
cbcc2f7911 GIT-BUILD-OPTIONS: wire up NO_GITWEB option
Building our "gitweb" interface is optional in our Makefile and in Meson
and not wired up at all with CMake, but disabling it causes a couple of
tests in the t950* range that pull in "t/lib-gitweb.sh". This is because
the test library knows to execute gitweb-tests based on whether or not
Perl is available, but we may have Perl available and still end up not
building gitweb e.g. with `make test NO_GITWEB=YesPlease`.

Fix this issue by wiring up a new "NO_GITWEB" build option so that we
can skip these tests in case gitweb is not built.

Note that this new build option requires us to move the configuration of
GIT-BUILD-OPTIONS to a later point in our Meson build instructions. But
as that file is only consumed by our tests at runtime this change does
not cause any issues.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:17:19 -08:00
Patrick Steinhardt
cfa1f2ae96 GIT-BUILD-OPTIONS: sort variables alphabetically
The variables declared and substituted in GIT-BUILD-OPTIONS are not
ordered in any obvious way. Sort them alphabetically so that it becomes
obvious where new variables should go.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:17:19 -08:00
Meet Soni
cef3d4a89f t7611: replace test -f with test_path_is* helpers
Replace `test -f` and `test ! -f` with `test_path_is_file` and
`test_path_is_missing` for better debuggability.

While `test -f` ensures that the file exists and is a regular file,
`test_path_is_file` provides clearer error messages on failure.

On the other hand, `test ! -f` checks either the absence of a regular
file or the presence of any other filesystem object, but looking at
them in the test individually, all of them should've said `test ! -e`,
i.e. "there shouldn't be anything at given path on filesystem."

Replace these cases with `test_path_is_missing` for better
debuggability.

Helped-by: karthik nayak <karthik.188@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:13:59 -08:00
Alexander Shopov
b59358100c Update the official repo of gitk
Point out:
- current maintaner
- contribution flow is via the mailing list

Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-26 08:06:42 -08:00
Junio C Hamano
76cf4f61c8 Merge https://github.com/j6t/git-gui
* 'master' of https://github.com/j6t/git-gui:
  git-gui: use system encoding to show console output
  git-gui: Remove forced rescan of stat-dirty files.
2024-12-26 08:02:23 -08:00
Junio C Hamano
996f0c583b Hopefully the final batch before 2.48-rc1
Let's wait for git-gui, gitk, and possibly po/ and delay the tagging
of the -rc1.  Many people are already offline for the end-of-year
holidays and it is a slow week, and 'master' front has too many new
things graduated from 'next' a bit too early for me to feel
comfortable.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23 10:13:58 -08:00
Junio C Hamano
6f8ae955bd Merge branch 'kn/reflog-migration'
"git refs migrate" learned to also migrate the reflog data across
backends.

* kn/reflog-migration:
  refs: mark invalid refname message for translation
  refs: add support for migrating reflogs
  refs: allow multiple reflog entries for the same refname
  refs: introduce the `ref_transaction_update_reflog` function
  refs: add `committer_info` to `ref_transaction_add_update()`
  refs: extract out refname verification in transactions
  refs/files: add count field to ref_lock
  refs: add `index` field to `struct ref_udpate`
  refs: include committer info in `ref_update` struct
2024-12-23 09:32:29 -08:00
Junio C Hamano
f74eae3e47 Merge branch 'ma/asciidoctor-build-fixes'
A topic to optionally build with meson, which has graduated to
'master' recently, broke Documentation pipeline with asciidoctor
for the normal Makefile build as well as meson-based one, which
have been corrected.

* ma/asciidoctor-build-fixes:
  asciidoctor-extensions.rb.in: inject GIT_DATE
  asciidoctor-extensions.rb.in: add missing word
  asciidoctor-extensions.rb.in: delete existing <refmiscinfo/>
2024-12-23 09:32:27 -08:00
Junio C Hamano
f074cdea46 Merge branch 'ps/build-hotfix'
A topic to optionally build with meson, which has graduated to
'master' recently, has regressed the normal Makefile build, which
is being corrected.

* ps/build-hotfix:
  meson: add options to override build information
  GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
  GIT-VERSION-GEN: fix overriding GIT_VERSION
  Makefile: introduce template for GIT-VERSION-GEN
  Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
  Makefile: stop including "GIT-VERSION-FILE" in docs
2024-12-23 09:32:26 -08:00
Junio C Hamano
83c8f76235 Merge branch 'ps/ci-meson'
The meson-build procedure is integrated into CI to catch and
prevent bitrotting.

* ps/ci-meson:
  ci: wire up Meson builds
  t: introduce compatibility options to clar-based tests
  t: fix out-of-tree tests for some git-p4 tests
  Makefile: detect missing Meson tests
  meson: detect missing tests at configure time
  t/unit-tests: rename clar-based unit tests to have a common prefix
  Makefile: drop -DSUPPRESS_ANNOTATED_LEAKS
  ci/lib: support custom output directories when creating test artifacts
2024-12-23 09:32:25 -08:00
Junio C Hamano
e9a4054320 Merge branch 'kl/doc-build-fix'
Build fix.

* kl/doc-build-fix:
  doc: remove extra quotes in generated docs
2024-12-23 09:32:23 -08:00
Junio C Hamano
a08ebf8b3e Merge branch 'tb/bitmap-fix-pack-reuse'
Code to reuse objects based on bitmap contents have been tightened
to avoid race condition even when multiple packs are involved.

* tb/bitmap-fix-pack-reuse:
  pack-bitmap.c: ensure pack validity for all reuse packs
2024-12-23 09:32:22 -08:00
Junio C Hamano
8650022fab Merge branch 'jk/prio-queue-sign-compare-fix'
Type clean-up.

* jk/prio-queue-sign-compare-fix:
  prio-queue: use size_t rather than int for size
2024-12-23 09:32:21 -08:00
Junio C Hamano
77825f7553 Merge branch 'ps/build-meson-gitweb'
meson-based build still tried to build and install gitweb even when
Perl is disabled, which has been corrected.

* ps/build-meson-gitweb:
  meson: skip gitweb build when Perl is disabled
2024-12-23 09:32:19 -08:00
Junio C Hamano
77edd59394 Merge branch 'sk/calloc-not-malloc-plus-memset'
Code clean-up.

* sk/calloc-not-malloc-plus-memset:
  git: use calloc instead of malloc + memset where possible
2024-12-23 09:32:18 -08:00
Junio C Hamano
88e59f8027 Merge branch 'js/range-diff-diff-merges'
"git range-diff" learned to optionally show and compare merge
commits in the ranges being compared, with the --diff-merges
option.

* js/range-diff-diff-merges:
  range-diff: introduce the convenience option `--remerge-diff`
  range-diff: optionally include merge commits' diffs in the analysis
2024-12-23 09:32:17 -08:00
Junio C Hamano
c4cc685a62 Merge branch 'js/mingw-rename-fix'
Update the way rename() emulation on Windows handle directories to
correct an earlier attempt to do the same.

* js/mingw-rename-fix:
  mingw_rename: do support directory renames
2024-12-23 09:32:16 -08:00
Junio C Hamano
bad5d1ad25 Merge branch 'js/github-windows-setup-fix'
Revert recent changes to the way windows environment is set up for
GitHub CI.

* js/github-windows-setup-fix:
  GitHub ci(windows): speed up initializing Git for Windows' minimal SDK again
2024-12-23 09:32:15 -08:00
Junio C Hamano
8cad35f353 Merge branch 'js/ps-build-cmake-fixup'
Build fixes for Windows.

* js/ps-build-cmake-fixup:
  cmake/vcxproj: stop special-casing `remote-ext`
  cmake: put the Perl modules into the correct location again
  cmake: use the correct file name for the Perl header
  cmake(mergetools): better support for out-of-tree builds
  cmake: better support for out-of-tree builds follow-up
2024-12-23 09:32:13 -08:00
Junio C Hamano
002a8a9d36 Merge branch 'as/show-index-uninitialized-hash'
Regression fix for 'show-index' when run outside of a repository.

* as/show-index-uninitialized-hash:
  t5300: add test for 'show-index --object-format'
  show-index: fix uninitialized hash function
2024-12-23 09:32:12 -08:00
Junio C Hamano
4156b6a741 Merge branch 'ps/build-sign-compare'
Start working to make the codebase buildable with -Wsign-compare.

* ps/build-sign-compare:
  t/helper: don't depend on implicit wraparound
  scalar: address -Wsign-compare warnings
  builtin/patch-id: fix type of `get_one_patchid()`
  builtin/blame: fix type of `length` variable when emitting object ID
  gpg-interface: address -Wsign-comparison warnings
  daemon: fix type of `max_connections`
  daemon: fix loops that have mismatching integer types
  global: trivial conversions to fix `-Wsign-compare` warnings
  pkt-line: fix -Wsign-compare warning on 32 bit platform
  csum-file: fix -Wsign-compare warning on 32-bit platform
  diff.h: fix index used to loop through unsigned integer
  config.mak.dev: drop `-Wno-sign-compare`
  global: mark code units that generate warnings with `-Wsign-compare`
  compat/win32: fix -Wsign-compare warning in "wWinMain()"
  compat/regex: explicitly ignore "-Wsign-compare" warnings
  git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-23 09:32:11 -08:00
Junio C Hamano
f7c607fac3 Merge branch 'kn/reftable-writer-log-write-verify'
Reftable backend adds check for upper limit of log's update_index.

* kn/reftable-writer-log-write-verify:
  reftable/writer: ensure valid range for log's update_index
2024-12-23 09:32:08 -08:00
Junio C Hamano
19fbad7918 Merge branch 'ps/ci-gitlab-update'
GitLab CI updates.

* ps/ci-gitlab-update:
  ci/lib: fix "CI setup" sections with GitLab CI
  ci/lib: do not interpret escape sequences in `group ()` arguments
  ci/lib: remove duplicate trap to end "CI setup" group
  gitlab-ci: update macOS images to Sonoma
2024-12-23 09:32:07 -08:00
Junio C Hamano
3151e6a121 Merge branch 'ps/reftable-alloc-failures-zalloc-fix'
Recent reftable updates mistook a NULL return from a request for
0-byte allocation as OOM and died unnecessarily, which has been
corrected.

* ps/reftable-alloc-failures-zalloc-fix:
  reftable/basics: return NULL on zero-sized allocations
  reftable/stack: fix zero-sized allocation when there are no readers
  reftable/merged: fix zero-sized allocation when there are no readers
  reftable/stack: don't perform auto-compaction with less than two tables
2024-12-23 09:32:06 -08:00
Patrick Steinhardt
d7282891f5 reftable/basics: return NULL on zero-sized allocations
In the preceding commits we have fixed a couple of issues when
allocating zero-sized objects. These issues were masked by
implementation-defined behaviour. Quoting malloc(3p):

  If size is 0, either:

    * A null pointer shall be returned and errno may be set to an
      implementation-defined value, or

    * A pointer to the allocated space shall be returned. The
      application shall ensure that the pointer is not used to access an
      object.

So it is perfectly valid that implementations of this function may or
may not return a NULL pointer in such a case.

Adapt both `reftable_malloc()` and `reftable_realloc()` so that they
return NULL pointers on zero-sized allocations. This should remove any
implementation-defined behaviour in our allocators and thus allows us to
detect such platform-specific issues more easily going forward.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-22 00:58:23 -08:00
Patrick Steinhardt
2d3cb4b4b5 reftable/stack: fix zero-sized allocation when there are no readers
Similar as the preceding commit, we may try to do a zero-sized
allocation when reloading a reftable stack that ain't got any tables.
It is implementation-defined whether malloc(3p) returns a NULL pointer
in that case or a zero-sized object. In case it does return a NULL
pointer though it causes us to think we have run into an out-of-memory
situation, and thus we return an error.

Fix this by only allocating arrays when they have at least one entry.

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-22 00:58:23 -08:00
Patrick Steinhardt
5ab83521cf reftable/merged: fix zero-sized allocation when there are no readers
It was reported [1] that Git started to fail with an out-of-memory error
when initializing repositories with the reftable backend on NonStop
platforms. A bisect led to 802c0646ac (reftable/merged: handle
allocation failures in `merged_table_init_iter()`, 2024-10-02), which
changed how we allocate memory when initializing a merged table.

The root cause of this seems to be that NonStop returns a `NULL` pointer
when doing a zero-sized allocation. This would've already happened
before the above change, but we never noticed because we did not check
the result. Now we do notice and thus return an out-of-memory error to
the caller.

Fix the issue by skipping the allocation altogether in case there are no
readers.

[1]: <00ad01db5017$aa9ce340$ffd6a9c0$@nexbridge.com>

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-22 00:58:23 -08:00
Patrick Steinhardt
8e27ee9220 reftable/stack: don't perform auto-compaction with less than two tables
In order to compact tables we need at least two tables. Bail out early
from `reftable_stack_auto_compact()` in case we have less than two
tables.

In the original, `stack_table_sizes_for_compaction()` yields an array
that has the same length as the number of tables. This array is then
passed on to `suggest_compaction_segment()`, which returns an empty
segment in case we have less than two tables. The segment is then passed
to `segment_size()`, which will return `0` because both start and end of
the segment are `0`. And because we only call `stack_compact_range()` in
case we have a positive segment size we don't perform auto-compaction at
all. Consequently, this change does not result in a user-visible change
in behaviour when called with a single table.

But when called with no tables this protects us against a potential
out-of-memory error: `stack_table_sizes_for_compaction()` would try to
allocate a zero-byte object when there aren't any tables, and that may
lead to a `NULL` pointer on some platforms like NonStop which causes us
to bail out with an out-of-memory error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-22 00:58:23 -08:00
Johannes Sixt
5c95773eac Merge branch 'js/no-rescan-on-empty-diff'
* js/no-rescan-on-empty-diff:
  git-gui: Remove forced rescan of stat-dirty files.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2024-12-21 14:06:33 +01:00
Martin Ågren
beb8081f31 asciidoctor-extensions.rb.in: inject GIT_DATE
After a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06), we no longer inject GIT_DATE when building with
Asciidoctor.

Replace the <date/> tag in the XML to inject the value of GIT_DATE.
Unlike <refmiscinfo/> as handled in a recent commit, we have no reason
to expect that this tag might be missing, so there's no need for "maybe
remove, then add" and we can just outright replace the one that
Asciidoctor has generated based on the mtime of the source file.

Compared to pre-a38edab7c8, we now end up injecting this also in the
build of Git.3pm, which until now has been using the mtime of Git.pm.
That is arguably even a good change since it results in more
reproducible builds.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 17:34:35 -08:00
Martin Ågren
c683924d06 asciidoctor-extensions.rb.in: add missing word
Commit a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06) stopped providing an attribute value "Git $(GIT_VERSION)" to
asciidoc/Asciidoctor over the command line. Instead, we now provide the
attribute to asciidoc through a generated asciidoc.conf, where the value
is generated as "Git @GIT_VERSION@".

In the similar mechanism for Asciidoctor, we forgot the "Git" prefix.
Restore it.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 17:34:35 -08:00
Martin Ågren
298805c823 asciidoctor-extensions.rb.in: delete existing <refmiscinfo/>
After the recent a38edab7c8 (Makefile: generate doc versions via
GIT-VERSION-GEN, 2024-12-06), building with Asciidoctor results in
manpages where the headers no longer contain "Git Manual" and the
footers no longer identify the built Git version.

Before a38edab7c8, we used to just provide a few attributes to
Asciidoctor (and asciidoc). Commit 7a30134358 (asciidoctor-extensions:
provide `<refmiscinfo/>`, 2019-09-16) noted that older versions of
Asciidoctor didn't propagate those attributes into the built XML files,
so we started injecting them ourselves from this script. With newer
versions of Asciidoctor, we'd end up with some harmless duplication
among the tags in the final XML.

Post-a38edab7c8, we don't provide these attributes and Asciidoctor
inserts empty-ish values. After our additions from 7a30134358, we get

  <refmiscinfo class="source">&#160;</refmiscinfo>
  <refmiscinfo class="manual">&#160;</refmiscinfo>
  <refmiscinfo class="source">2.47.1.[...]</refmiscinfo>
  <refmiscinfo class="manual">Git Manual</refmiscinfo>

When these are handled, it appears to be first come first served,
meaning that our additions have no effect and we regress as described in
the first paragraph.

Remove existing "source" or "manual" <refmiscinfo/> tags before adding
ours. I considered removing all <refmiscinfo/> to get a nice clean
slate, instead of just those two that we want to replace to be a bit
more precise. I opted for the latter. Maybe one day, Asciidoctor learns
to insert something useful there which `xmlto` can pick up and make good
use of -- let's not interfere.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 17:34:35 -08:00
Junio C Hamano
f5f82c0d5f Merge branch 'ps/build-hotfix' into ma/asciidoctor-build-fixes
* ps/build-hotfix:
  meson: add options to override build information
  GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
  GIT-VERSION-GEN: fix overriding GIT_VERSION
  Makefile: introduce template for GIT-VERSION-GEN
  Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
  Makefile: stop including "GIT-VERSION-FILE" in docs
2024-12-20 17:34:25 -08:00
Patrick Steinhardt
1bc815c3d0 meson: add options to override build information
We inject various different kinds of build information into build
artifacts, like the version string or the commit from which Git was
built. Add options to let users explicitly override this information
with Meson.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 12:36:46 -08:00
Patrick Steinhardt
cfa01e6da5 GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
Same as with the preceding commit, neither GIT_BUILT_FROM_COMMIT nor
GIT_DATE can be overridden via the environment. Especially the latter is
of importance given that we set it in our own "Documentation/doc-diff"
script.

Make the values of both variables overridable. Luckily we don't pull in
these values via any included Makefiles, so the fix is trivial compared
to the fix for GIT_VERSON.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 12:36:45 -08:00