Commit Graph

3074 Commits

Author SHA1 Message Date
Johannes Schindelin
9e049b2528 credential-cache: handle ECONNREFUSED gracefully (#5329)
I should probably add some tests for this.
2025-02-06 19:33:23 +01:00
Johannes Schindelin
3803e5b40b Add experimental 'git survey' builtin (#5174)
This introduces `git survey` to Git for Windows ahead of upstream for
the express purpose of getting the path-based analysis in the hands of
more folks.

The inspiration of this builtin is
[`git-sizer`](https://github.com/github/git-sizer), but since that
command relies on `git cat-file --batch` to get the contents of objects,
it has limits to how much information it can provide.

This is mostly a rewrite of the `git survey` builtin that was introduced
into the `microsoft/git` fork in microsoft/git#667. That version had a
lot more bells and whistles, including an analysis much closer to what
`git-sizer` provides.

The biggest difference in this version is that this one is focused on
using the path-walk API in order to visit batches of objects based on a
common path. This allows identifying, for instance, the path that is
contributing the most to the on-disk size across all versions at that
path.

For example, here are the top ten paths contributing to my local Git
repository (which includes `microsoft/git` and `gitster/git`):

```
TOP FILES BY DISK SIZE
============================================================================
                                    Path | Count | Disk Size | Inflated Size
-----------------------------------------+-------+-----------+--------------
                       whats-cooking.txt |  1373 |  11637459 |      37226854
             t/helper/test-gvfs-protocol |     2 |   6847105 |      17233072
                      git-rebase--helper |     1 |   6027849 |      15269664
                          compat/mingw.c |  6111 |   5194453 |     463466970
             t/helper/test-parse-options |     1 |   3420385 |       8807968
                  t/helper/test-pkt-line |     1 |   3408661 |       8778960
      t/helper/test-dump-untracked-cache |     1 |   3408645 |       8780816
            t/helper/test-dump-fsmonitor |     1 |   3406639 |       8776656
                                po/vi.po |   104 |   1376337 |      51441603
                                po/de.po |   210 |   1360112 |      71198603
```

This kind of analysis has been helpful in identifying the reasons for
growth in a few internal monorepos. Those findings motivated the changes
in #5157 and #5171.

With this early version in Git for Windows, we can expand the reach of
the experimental tool in advance of it being contributed to the upstream
project.

Unfortunately, this will mean that in the next `microsoft/git` rebase,
Jeff Hostetler's version will need to be pulled out since there are
enough conflicts. These conflicts include how tables are stored and
generated, as the version in this PR is slightly more general to allow
for different kinds of data.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-02-06 19:33:22 +01:00
Johannes Schindelin
8d71eebf0a Introduce 'git backfill' to get missing blobs in a partial clone (#5172)
This change introduces the `git backfill` command which uses the path
walk API to download missing blobs in a blobless partial clone.

By downloading blobs that correspond to the same file path at the same
time, we hope to maximize the potential benefits of delta compression
against multiple versions.

These downloads occur in a configurable batch size, presenting a
mechanism to perform "resumable" clones: `git clone --filter=blob:none`
gets the commits and trees, then `git backfill` will download all
missing blobs. If `git backfill` is interrupted partway through, it can
be restarted and will redownload only the missing objects.

When combining blobless partial clones with sparse-checkout, `git
backfill` will assume its `--sparse` option and download only the blobs
within the sparse-checkout. Users may want to do this as the repo size
will still be smaller than the full repo size, but commands like `git
blame` or `git log -L` will not suffer from many one-by-one blob
downloads.

Future directions should consider adding a pathspec or file prefix to
further focus which paths are being downloaded in a batch.
2025-02-06 19:33:22 +01:00
Derrick Stolee
664e5ac0ff Add path walk API and its use in 'git pack-objects' (#5171)
This is a follow up to #5157 as well as motivated by the RFC in
gitgitgadget/git#1786.

We have ways of walking all objects, but it is focused on visiting a
single commit and then expanding the new trees and blobs reachable from
that commit that have not been visited yet. This means that objects
arrive without any locality based on their path.

Add a new "path walk API" that focuses on walking objects in batches
according to their type and path. This will walk all annotated tags, all
commits, all root trees, and then start a depth-first search among all
paths in the repo to collect trees and blobs in batches.

The most important application for this is being fast-tracked to Git for
Windows: `git pack-objects --path-walk`. This application of the path
walk API discovers the objects to pack via this batched walk, and
automatically groups objects that appear at a common path so they can be
checked for delta comparisons.

This use completely avoids any name-hash collisions (even the collisions
that sometimes occur with the new `--full-name-hash` option) and can be
much faster to compute since the first pass of delta calculations does
not waste time on objects that are unlikely to be diffable.

Some statistics are available in the commit messages.
2025-02-06 19:33:21 +01:00
Johannes Schindelin
39ff42e497 pack-objects: create new name-hash algorithm (#5157)
This is an updated version of gitgitgadget/git#1785, intended for early
consumption into Git for Windows.

The idea here is to add a new `--full-name-hash` option to `git
pack-objects` and `git repack`. This adjusts the name-hash value used
for finding delta bases in such a way that uses the full path name with
a lower likelihood of collisions than the default name-hash algorithm.
In many repositories with name-hash collisions and many versions of
those paths, this can significantly reduce the size of a full repack. It
can also help in certain cases of `git push`, but only if the pack is
already artificially inflated by name-hash collisions; cases that find
"sibling" deltas as better choices become worse with `--full-name-hash`.

Thus, this option is currently recommended for full repacks of large
repos, and on client machines without reachability bitmaps.

Some care is taken to ignore this option when using bitmaps, either
writing bitmaps or using a bitmap walk during reads. The bitmap file
format contains name-hash values, but no way to indicate which function
is used, so compatibility is a concern for bitmaps. Future work could
explore this idea.

After this PR is merged, then the more-involved `--path-walk` option may
be considered.
2025-02-06 19:33:21 +01:00
Johannes Schindelin
b1033af606 Lazy load libcurl, allowing for an SSL/TLS backend-specific libcurl (#4410)
As per
https://github.com/git-for-windows/git/issues/4350#issuecomment-1485041503,
the major block for upgrading Git for Windows' OpenSSL from v1.1 to v3
is the tricky part where such an upgrade would break `git fetch`/`git
clone` and `git push` because the libcurl depends on the OpenSSL DLL,
and the major version bump will _change_ the file name of said DLL.

To overcome that, the plan is to build libcurl flavors for each
supported SSL/TLS backend, aligning with the way MSYS2 builds libcurl,
then switch Git for Windows' SDK to the Secure Channel-flavored libcurl,
and teach Git to look for the specific flavor of libcurl corresponding
to the `http.sslBackend` setting (if that was configured).

Here is the PR to teach Git that trick.
2025-02-06 19:33:18 +01:00
Johannes Schindelin
725cece7f6 Merge pull request #2974 from derrickstolee/maintenance-and-headless
Include Windows-specific maintenance and headless-git
2025-02-06 19:33:11 +01:00
Matthias Aßhauer
52f76dc942 compat/mingw: handle WSA errors in strerror
We map WSAGetLastError() errors to errno errors in winsock_error_to_errno(),
but the MSVC strerror() implementation only produces "Unknown error" for
most of them. Produce some more meaningful error messages in these
cases.

Our builds for ARM64 link against the newer UCRT strerror() that does know
these errors, so we won't change the strerror() used there.

The wording of the messages is copied from glibc strerror() messages.

Reported-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-02-06 19:33:04 +01:00
Jeff Hostetler
9744533f99 survey: stub in new experimental 'git-survey' command
Start work on a new 'git survey' command to scan the repository
for monorepo performance and scaling problems.  The goal is to
measure the various known "dimensions of scale" and serve as a
foundation for adding additional measurements as we learn more
about Git monorepo scaling problems.

The initial goal is to complement the scanning and analysis performed
by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool.
It is hoped that by creating a builtin command, we may be able to take
advantage of internal Git data structures and code that is not
accessible from GO to gain further insight into potential scaling
problems.

Co-authored-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-02-06 19:33:03 +01:00
Derrick Stolee
4497a07d47 backfill: add builtin boilerplate
In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin.

RFC TODO: When preparing this for a full implementation, make sure it is
based on the newest standards introduced by [1].

[1] https://lore.kernel.org/git/xmqqjzfq2f0f.fsf@gitster.g/T/#m606036ea2e75a6d6819d6b5c90e729643b0ff7f7
    [PATCH 1/3] builtin: add a repository parameter for builtin functions

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-02-06 19:33:02 +01:00
Derrick Stolee
6eff58dd64 test-tool: add helper for name-hash values
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.

Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values.

Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.

My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                        4.5K
5314.2: number of distinct name-hashes                       4.1K
5314.3: number of distinct full-name-hashes                  4.5K
5314.4: maximum multiplicity of name-hashes                    13
5314.5: maximum multiplicity of fullname-hashes                 1

Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.

In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                       19.6K
5314.2: number of distinct name-hashes                       8.2K
5314.3: number of distinct full-name-hashes                 19.6K
5314.4: maximum multiplicity of name-hashes                   279
5314.5: maximum multiplicity of fullname-hashes                 1

[1] https://github.com/microsoft/fluentui

That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.

In this repository, no collisions occur for the full-name-hash
algorithm.

In a more extreme example, an internal monorepo had a much worse
collision rate:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                      221.6K
5314.2: number of distinct name-hashes                      72.0K
5314.3: number of distinct full-name-hashes                221.6K
5314.4: maximum multiplicity of name-hashes                 14.4K
5314.5: maximum multiplicity of fullname-hashes                 2

Even in this repository with many more paths at HEAD, the collision rate
was low and the maximum number of paths being grouped into a single
bucket by the full-path-name algorithm was two.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-02-06 19:33:00 +01:00
Derrick Stolee
2f90b90d66 t6601: add helper for testing path-walk API
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-02-06 19:33:00 +01:00
Derrick Stolee
b8aac654ee path-walk: introduce an object walk by path
In anticipation of a few planned applications, introduce the most basic form
of a path-walk API. It currently assumes that there are no UNINTERESTING
objects, and does not include any complicated filters. It calls a function
pointer on groups of tree and blob objects as grouped by path. This only
includes objects the first time they are discovered, so an object that
appears at multiple paths will not be included in two batches.

There are many future adaptations that could be made, but they are left for
future updates when consumers are ready to take advantage of those features.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2025-02-06 19:33:00 +01:00
Johannes Schindelin
6ffa94ab42 http: support lazy-loading libcurl also on Windows
This implements the Windows-specific support code, because everything is
slightly different on Windows, even loading shared libraries.

Note: I specifically do _not_ use the code from
`compat/win32/lazyload.h` here because that code is optimized for
loading individual functions from various system DLLs, while we
specifically want to load _many_ functions from _one_ DLL here, and
distinctly not a system DLL (we expect libcurl to be located outside
`C:\Windows\system32`, something `INIT_PROC_ADDR` refuses to work with).
Also, the `curl_easy_getinfo()`/`curl_easy_setopt()` functions are
declared as vararg functions, which `lazyload.h` cannot handle. Finally,
we are about to optionally override the exact file name that is to be
loaded, which is a goal contrary to `lazyload.h`'s design.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-02-06 19:32:57 +01:00
Johannes Schindelin
622a259ae9 http: optionally load libcurl lazily
This compile-time option allows to ask Git to load libcurl dynamically
at runtime.

Together with a follow-up patch that optionally overrides the file name
depending on the `http.sslBackend` setting, this kicks open the door for
installing multiple libcurl flavors side by side, and load the one
corresponding to the (runtime-)configured SSL/TLS backend.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-02-06 19:32:57 +01:00
Jeff Hostetler
c1ef925bb0 Makefile: clean up .ilk files when MSVC=1
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
2025-02-06 19:32:50 +01:00
Johannes Schindelin
27712a73ef mimalloc: offer a build-time option to enable it
By defining `USE_MIMALLOC`, Git can now be compiled with that
nicely-fast and small allocator.

Note that we have to disable a couple `DEVELOPER` options to build
mimalloc's source code, as it makes heavy use of declarations after
statements, among other things that disagree with Git's conventions.

We even have to silence some GCC warnings in non-DEVELOPER mode. For
example, the `-Wno-array-bounds` flag is needed because in `-O2` builds,
trying to call `NtCurrentTeb()` (which `_mi_thread_id()` does on
Windows) causes the bogus warning about a system header, likely related
to https://sourceforge.net/p/mingw-w64/mailman/message/37674519/ and to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578:

C:/git-sdk-64-minimal/mingw64/include/psdk_inc/intrin-impl.h:838:1:
        error: array subscript 0 is outside array bounds of 'long long unsigned int[0]' [-Werror=array-bounds]
  838 | __buildreadseg(__readgsqword, unsigned __int64, "gs", "q")
      | ^~~~~~~~~~~~~~

Also: The `mimalloc` library uses C11-style atomics, therefore we must
require that standard when compiling with GCC if we want to use
`mimalloc` (instead of requiring "only" C99). This is what we do in the
CMake definition already, therefore this commit does not need to touch
`contrib/buildsystems/`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-02-06 19:32:46 +01:00
Johannes Schindelin
eb1cd4d7d3 Import the source code of mimalloc v2.1.2
This commit imports mimalloc's source code as per v2.1.2, fetched from
the tag at https://github.com/microsoft/mimalloc.

The .c files are from the src/ subdirectory, and the .h files from the
include/ and include/mimalloc/ subdirectories. We will subsequently
modify the source code to accommodate building within Git's context.

Since we plan on using the `mi_*()` family of functions, we skip the
C++-specific source code, some POSIX compliant functions to interact
with mimalloc, and the code that wants to support auto-magic overriding
of the `malloc()` function (mimalloc-new-delete.h, alloc-posix.c,
mimalloc-override.h, alloc-override.c, alloc-override-osx.c,
alloc-override-win.c and static.c).

To appease the `check-whitespace` job of Git's Continuous Integration,
this commit was washed one time via `git rebase --whitespace=fix`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2025-02-06 19:32:45 +01:00
Junio C Hamano
fc89d14c63 Revert barrier-based LSan threading race workaround
The extra "barrier" approach was too much code whose sole purpose
was to work around a race that is not even ours (i.e. in LSan's
teardown code).

In preparation for queuing a solution taking a much-less-invasive
approach, let's revert them.
2025-01-01 14:13:01 -08:00
Jeff King
7d0037b59a thread-utils: introduce optional barrier type
One thread primitive we don't yet support is a barrier: it waits for all
threads to reach a synchronization point before letting any of them
continue. This would be useful for avoiding the LSan race we see in
index-pack (and other places) by having all threads complete their
initialization before any of them start to do real work.

POSIX introduced a pthread_barrier_t in 2004, which does what we want.
But if we want to rely on it:

  1. Our Windows pthread emulation would need a new set of wrapper
     functions. There's a Synchronization Barrier primitive there, which
     was introduced in Windows 8 (which is old enough for us to depend
     on).

  2. macOS (and possibly other systems) has pthreads but not
     pthread_barrier_t. So there we'd have to implement our own barrier
     based on the mutex and cond primitives.

Those are do-able, but since we only care about avoiding races in our
LSan builds, there's an easier way: make it a noop on systems without a
native pthread barrier.

This patch introduces a "maybe_thread_barrier" API. The clunky name
(rather than just using pthread_barrier directly) should hopefully clue
people in that on some systems it will do nothing. It's wired to a
Makefile knob which has to be triggered manually, and we enable it for
the linux-leaks CI jobs (since we know we'll have it there).

There are some other possible options:

  - we could turn it on all the time for Linux systems based on uname.
    But we really only care about it for LSan builds, and there is no
    need to add extra code to regular builds.

  - we could turn it on only for LSan builds. But that would break
    builds on non-Linux platforms (like macOS) that otherwise should
    support sanitizers.

  - we could trigger only on the combination of Linux and LSan together.
    This isn't too hard to do, but the uname check isn't completely
    accurate. It is really about what your libc supports, and non-glibc
    systems might not have it (though at least musl seems to).

    So we'd risk breaking builds on those systems, which would need to
    add a new knob. Though the upside would be that running local "make
    SANITIZE=leak test" would be protected automatically.

And of course none of this protects LSan runs from races on systems
without pthread barriers. It's probably OK in practice to protect only
our CI jobs, though. The race is rare-ish and most leak-checking happens
through CI.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-30 06:18:57 -08:00
Patrick Steinhardt
cbcc2f7911 GIT-BUILD-OPTIONS: wire up NO_GITWEB option
Building our "gitweb" interface is optional in our Makefile and in Meson
and not wired up at all with CMake, but disabling it causes a couple of
tests in the t950* range that pull in "t/lib-gitweb.sh". This is because
the test library knows to execute gitweb-tests based on whether or not
Perl is available, but we may have Perl available and still end up not
building gitweb e.g. with `make test NO_GITWEB=YesPlease`.

Fix this issue by wiring up a new "NO_GITWEB" build option so that we
can skip these tests in case gitweb is not built.

Note that this new build option requires us to move the configuration of
GIT-BUILD-OPTIONS to a later point in our Meson build instructions. But
as that file is only consumed by our tests at runtime this change does
not cause any issues.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:17:19 -08:00
Patrick Steinhardt
cfa1f2ae96 GIT-BUILD-OPTIONS: sort variables alphabetically
The variables declared and substituted in GIT-BUILD-OPTIONS are not
ordered in any obvious way. Sort them alphabetically so that it becomes
obvious where new variables should go.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-27 08:17:19 -08:00
Junio C Hamano
f074cdea46 Merge branch 'ps/build-hotfix'
A topic to optionally build with meson, which has graduated to
'master' recently, has regressed the normal Makefile build, which
is being corrected.

* ps/build-hotfix:
  meson: add options to override build information
  GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
  GIT-VERSION-GEN: fix overriding GIT_VERSION
  Makefile: introduce template for GIT-VERSION-GEN
  Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
  Makefile: stop including "GIT-VERSION-FILE" in docs
2024-12-23 09:32:26 -08:00
Junio C Hamano
83c8f76235 Merge branch 'ps/ci-meson'
The meson-build procedure is integrated into CI to catch and
prevent bitrotting.

* ps/ci-meson:
  ci: wire up Meson builds
  t: introduce compatibility options to clar-based tests
  t: fix out-of-tree tests for some git-p4 tests
  Makefile: detect missing Meson tests
  meson: detect missing tests at configure time
  t/unit-tests: rename clar-based unit tests to have a common prefix
  Makefile: drop -DSUPPRESS_ANNOTATED_LEAKS
  ci/lib: support custom output directories when creating test artifacts
2024-12-23 09:32:25 -08:00
Patrick Steinhardt
992bc5618f GIT-VERSION-GEN: fix overriding GIT_VERSION
GIT-VERSION-GEN tries to derive the version that Git is being built from
via multiple different sources in the following order:

  1. A file called "version" in the source tree's root directory, if it
     exists.

  2. The current commit in case Git is built from a Git repository.

  3. Otherwise, we use a fallback version stored in a variable which is
     bumped whenever a new Git version is getting tagged.

It used to be possible to override the version by overriding the
`GIT_VERSION` Makefile variable (e.g. `make GIT_VERSION=foo`). This
worked somewhat by chance, only: `GIT-VERSION-GEN` would write the
actual Git version into `GIT-VERSION-FILE`, not the overridden value,
but when including the file into our Makefile we would not override the
`GIT_VERSION` variable because it has already been set by the user. And
because our Makefile used the variable to propagate the version to our
build tools instead of using `GIT-VERSION-FILE` the resulting build
artifacts used the overridden version.

But that subtle mechanism broke with 4838deab65 (Makefile: refactor
GIT-VERSION-GEN to be reusable, 2024-12-06) and subsequent commits
because the version information is not propagated via the Makefile
variable anymore, but instead via the files that `GIT-VERSION-GEN`
started to write. And as the script never knew about the `GIT_VERSION`
environment variable in the first place it uses one of the values listed
above instead of the overridden value.

Fix this issue by making `GIT-VERSION-GEN` handle the case where
`GIT_VERSION` has been set via the environment.

Note that this requires us to introduce a new GIT_VERSION_OVERRIDE
variable that stores a potential user-provided value, either via the
environment or via "config.mak". Ideally we wouldn't need it and could
just continue to use GIT_VERSION for this. But unfortunately, Makefiles
will first include all sub-Makefiles before figuring out whether it
needs to re-make any of them [1]. Consequently, if there already is a
GIT-VERSION-FILE, we would have slurped in its value of GIT_VERSION
before we call GIT-VERSION-GEN, and because GIT-VERSION-GEN now uses
that value as an override it would mean that the first generated value
for GIT_VERSION will remain unchanged.

Furthermore we have to move the include for "GIT-VERSION-FILE" after the
includes for "config.mak" and related so that GIT_VERSION_OVERRIDE can
be set to the value provided by "config.mak".

[1]: https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 12:36:45 -08:00
Patrick Steinhardt
114494ae2c Makefile: introduce template for GIT-VERSION-GEN
Introduce a new template to call GIT-VERSION-GEN. This will allow us to
iterate on how exactly the script is called in subsequent commits
without having to adapt all call sites every time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 12:36:45 -08:00
Patrick Steinhardt
b329f2eb00 Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
Some of the callsites of GIT-VERSION-GEN generate the target file with a
"+" suffix first and then move the file into place when the new contents
are different compared to the old contents. This allows us to avoid a
needless rebuild by not updating timestamps of the target file when its
contents will remain unchanged anyway.

In fact though, this exact logic is already handled in GIT-VERSION-GEN,
so doing this manually is pointless. This is a leftover from an earlier
version of 4838deab65 (Makefile: refactor GIT-VERSION-GEN to be
reusable, 2024-12-06), where the script didn't handle that logic for us.

Drop the needless indirection.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-20 12:36:44 -08:00
Junio C Hamano
29e5596eb8 Merge branch 'ps/build'
Build procedure update plus introduction of Meson based builds.

* ps/build: (24 commits)
  Introduce support for the Meson build system
  Documentation: add comparison of build systems
  t: allow overriding build dir
  t: better support for out-of-tree builds
  Documentation: extract script to generate a list of mergetools
  Documentation: teach "cmd-list.perl" about out-of-tree builds
  Documentation: allow sourcing generated includes from separate dir
  Makefile: simplify building of templates
  Makefile: write absolute program path into bin-wrappers
  Makefile: allow "bin-wrappers/" directory to exist
  Makefile: refactor generators to be PWD-independent
  Makefile: extract script to generate gitweb.js
  Makefile: extract script to generate gitweb.cgi
  Makefile: extract script to massage Python scripts
  Makefile: extract script to massage Shell scripts
  Makefile: use "generate-perl.sh" to massage Perl library
  Makefile: extract script to massage Perl scripts
  Makefile: consistently use PERL_PATH
  Makefile: generate doc versions via GIT-VERSION-GEN
  Makefile: generate "git.rc" via GIT-VERSION-GEN
  ...
2024-12-15 17:54:33 -08:00
Junio C Hamano
cd0a222f08 Merge branch 'es/oss-fuzz'
Backport oss-fuzz tests for us to our codebase.

* es/oss-fuzz:
  fuzz: port fuzz-url-decode-mem from OSS-Fuzz
  fuzz: port fuzz-parse-attr-line from OSS-Fuzz
  fuzz: port fuzz-credential-from-url-gently from OSS-Fuzz
2024-12-13 07:33:42 -08:00
Patrick Steinhardt
c081e7340f t/unit-tests: rename clar-based unit tests to have a common prefix
All of the code files for unit tests using the self-grown unit testing
framework have a "t-" prefix to their name. This makes it easy to
identify them and use globbing in our Makefile and in other places. On
the other hand though, our clar-based unit tests have no prefix at all
and thus cannot easily be discerned from other files in the unit test
directory.

Introduce a new "u-" prefix for clar-based unit tests. This prefix will
be used in a subsequent commit to easily identify such tests.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-13 06:48:46 -08:00
Patrick Steinhardt
23eeee08d6 Makefile: drop -DSUPPRESS_ANNOTATED_LEAKS
The -DSUPPRESS_ANNOTATED_LEAKS preprocessor directive was used to enable
our `UNLEAK()` macro in the past, which marks memory as still-reachable
so that the leak sanitizer does not complain. Starting with 52c7dbd036
(git-compat-util: drop now-unused `UNLEAK()` macro, 2024-11-20) this
macro has been removed, and thus the preprocessor directive is not
required anymore, either.

Drop it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-13 06:48:45 -08:00
Junio C Hamano
5c46677067 Merge branch 'ps/build' into ps/ci-meson
* ps/build: (24 commits)
  Introduce support for the Meson build system
  Documentation: add comparison of build systems
  t: allow overriding build dir
  t: better support for out-of-tree builds
  Documentation: extract script to generate a list of mergetools
  Documentation: teach "cmd-list.perl" about out-of-tree builds
  Documentation: allow sourcing generated includes from separate dir
  Makefile: simplify building of templates
  Makefile: write absolute program path into bin-wrappers
  Makefile: allow "bin-wrappers/" directory to exist
  Makefile: refactor generators to be PWD-independent
  Makefile: extract script to generate gitweb.js
  Makefile: extract script to generate gitweb.cgi
  Makefile: extract script to massage Python scripts
  Makefile: extract script to massage Shell scripts
  Makefile: use "generate-perl.sh" to massage Perl library
  Makefile: extract script to massage Perl scripts
  Makefile: consistently use PERL_PATH
  Makefile: generate doc versions via GIT-VERSION-GEN
  Makefile: generate "git.rc" via GIT-VERSION-GEN
  ...
2024-12-12 16:30:28 +09:00
Junio C Hamano
de9278127e Merge branch 'ps/reftable-detach'
Isolates the reftable subsystem from the rest of Git's codebase by
using fewer pieces of Git's infrastructure.

* ps/reftable-detach:
  reftable/system: provide thin wrapper for lockfile subsystem
  reftable/stack: drop only use of `get_locked_file_path()`
  reftable/system: provide thin wrapper for tempfile subsystem
  reftable/stack: stop using `fsync_component()` directly
  reftable/system: stop depending on "hash.h"
  reftable: explicitly handle hash format IDs
  reftable/system: move "dir.h" to its only user
2024-12-10 10:04:56 +09:00
Patrick Steinhardt
7e0730c8ba t: better support for out-of-tree builds
Our in-tree builds used by the Makefile use various different build
directories scattered around different locations. The paths to those
build directories have to be propagated to our tests such that they can
find the contained files. This is done via a mixture of hardcoded paths
in our test library and injected variables in our bin-wrappers or
"GIT-BUILD-OPTIONS".

The latter two mechanisms are preferable over using hardcoded paths. For
one, we have all paths which are subject to change stored in a small set
of central files instead of having the knowledge of build paths in many
files. And second, it allows build systems which build files elsewhere
to adapt those paths based on their own needs. This is especially nice
in the context of build systems that use out-of-tree builds like CMake
or Meson.

Remove hardcoded knowledge of build paths from our test library and move
it into our bin-wrappers and "GIT-BUILD-OPTIONS".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:13 +09:00
Patrick Steinhardt
d2407bb8dc Makefile: write absolute program path into bin-wrappers
Write the absolute program path into our bin-wrappers. This allows us to
simplify the Meson build instructions we are about to introduce a bit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:12 +09:00
Patrick Steinhardt
95bcd6f0b7 Makefile: allow "bin-wrappers/" directory to exist
The "bin-wrappers/" directory gets created by our build system and is
populated with one script for each of our binaries. There isn't anything
inherently wrong with the current layout, but it is somewhat hard to
adapt for out-of-tree build systems.

Adapt the layout such that our "bin-wrappers/" directory always exists
and contains our "wrap-for-bin.sh" script to make things a little bit
easier for subsequent steps.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:11 +09:00
Patrick Steinhardt
3f145a4fe3 Makefile: refactor generators to be PWD-independent
We have multiple scripts that generate headers from other data. All of
these scripts have the assumption built-in that they are executed in the
current source directory, which makes them a bit unwieldy to use during
out-of-tree builds.

Refactor them to instead take the source directory as well as the output
file as arguments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:11 +09:00
Patrick Steinhardt
b7835b941b Makefile: extract script to massage Python scripts
Extract a script that massages Python scripts. This provides a couple of
benefits:

  - The build logic is deduplicated across Make, CMake and Meson.

  - CMake learns to rewrite scripts as-needed at build time instead of
    only writing them at configure time.

Furthermore, we will use this script when introducing Meson to
deduplicate the logic across build systems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:10 +09:00
Patrick Steinhardt
eb98cb835c Makefile: extract script to massage Shell scripts
Same as in the preceding commits, extract a script that allows us to
unify how we massage shell scripts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:10 +09:00
Patrick Steinhardt
ccfba9e0c4 Makefile: use "generate-perl.sh" to massage Perl library
Extend "generate-perl.sh" such that it knows to also massage the Perl
library files. There are two major differences:

  - We do not read in the Perl header. This is handled by matching on
    whether or not we have a Perl shebang.

  - We substitute some more variables, which we read in via our
    GIT-BUILD-OPTIONS.

Adapt both our Makefile and the CMake build instructions to use this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:10 +09:00
Patrick Steinhardt
e4b488049a Makefile: extract script to massage Perl scripts
Extract the script to inject various build-time parameters into our Perl
scripts into a standalone script. This is done such that we can reuse it
in other build systems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:09 +09:00
Patrick Steinhardt
c2a3b847ed Makefile: consistently use PERL_PATH
When injecting the Perl path into our scripts we sometimes use '@PERL@'
while we othertimes use '@PERL_PATH@'. Refactor the code use the latter
consistently, which makes it easier to reuse the same logic for multiple
scripts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:09 +09:00
Patrick Steinhardt
9bb10d27e7 Makefile: generate "git.rc" via GIT-VERSION-GEN
The "git.rc" is used on Windows to embed information like the project
name and version into the resulting executables. As such we need to
inject the version information, which we do by using preprocessor
defines. The logic to do so is non-trivial and needs to be kept in sync
with the different build systems.

Refactor the logic so that we generate "git.rc" via `GIT-VERSION-GEN`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:09 +09:00
Patrick Steinhardt
0c8d339514 Makefile: propagate Git version via generated header
We set up a couple of preprocessor macros when compiling Git that
propagate the version that Git was built from to `git version` et al.
The way this is set up makes it harder than necessary to reuse the
infrastructure across the different build systems.

Refactor this such that we generate a "version-def.h" header via
`GIT-VERSION-GEN` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:08 +09:00
Patrick Steinhardt
4838deab65 Makefile: refactor GIT-VERSION-GEN to be reusable
Our "GIT-VERSION-GEN" script always writes the "GIT-VERSION-FILE" into
the current directory, where the expectation is that it should exist in
the source directory. But other build systems that support out-of-tree
builds may not want to do that to keep the source directory pristine,
even though CMake currently doesn't care.

Refactor the script such that it won't write the "GIT-VERSION-FILE"
directly anymore, but instead knows to replace @PLACEHOLDERS@ in an
arbitrary input file. This allows us to simplify the logic in CMake to
determine the project version, but can also be reused later on in order
to generate other files that need to contain version information like
our "git.rc" file.

While at it, change the format of the version file by removing the
spaces around the equals sign. Like this we can continue to include the
file in our Makefiles, but can also start to source it in shell scripts
in subsequent steps.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:08 +09:00
Patrick Steinhardt
dbe46c0feb Makefile: consistently use @PLACEHOLDER@ to substitute
We have a bunch of placeholders in our scripts that we replace at build
time, for example by using sed(1). These placeholders come in three
different formats: @PLACEHOLDER@, @@PLACEHOLDER@@ and ++PLACEHOLDER++.

Next to being inconsistent it also creates a bit of a problem with
CMake, which only supports the first syntax in its `configure_file()`
function. To work around that we instead manually replace placeholders
via string operations, which is a hassle and removes safeguards that
CMake has to verify that we didn't forget to replace any placeholders.
Besides that, other build systems like Meson also support the CMake
syntax.

Unify our codebase to consistently use the syntax supported by such
build systems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:08 +09:00
Patrick Steinhardt
4638e8806e Makefile: use common template for GIT-BUILD-OPTIONS
The "GIT-BUILD-OPTIONS" file is generated by our build systems to
propagate built-in features and paths to our tests. The generation is
done ad-hoc, where both our Makefile and the CMake build instructions
simply echo a bunch of strings into the file. This makes it very hard to
figure out what variables are expected to exist and what format they
have, and the written variables can easily get out of sync between build
systems.

Introduce a new "GIT-BUILD-OPTIONS.in" template to address this issue.
This has multiple advantages:

  - It demonstrates which built options exist in the first place.

  - It can serve as a spot to document the build options.

  - Some build systems complain when not all variables could be
    substituted, alerting us of mismatches. Others don't, but if we
    forgot to substitute such variables we now have a bogus string that
    will likely cause our tests to fail, if they have any meaning in the
    first place.

Backfill values that we didn't yet set in our CMake build instructions.
While at it, remove the `SUPPORTS_SIMPLE_IPC` variable that we only set
up in CMake as it isn't used anywhere.

This change requires us to adapt the setup of TEST_OUTPUT_DIRECTORY in
"test-lib.sh" such that it does not get overwritten after sourcing when
it has been set up via the environment. This is the only instance I
could find where we rely on ordering on variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-07 07:52:08 +09:00
Patrick Steinhardt
01e49941d6 reftable/system: provide thin wrapper for tempfile subsystem
We use the tempfile subsystem to write temporary tables, but given that
we're in the process of converting the reftable library to become
standalone we cannot use this subsystem directly anymore. While we could
in theory convert the code to use mkstemp(3p) instead, we'd lose access
to our infrastructure that automatically prunes tempfiles via atexit(3p)
or signal handlers.

Provide a thin wrapper for the tempfile subsystem instead. Like this,
the compatibility shim is fully self-contained in "reftable/system.c".
Downstream users of the reftable library would have to implement their
own tempfile shims by replacing "system.c" with a custom version.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-19 12:23:10 +09:00
Patrick Steinhardt
5dac35bbde Makefile: let clar header targets depend on their scripts
The targets that generate clar headers depend on their source files, but
not on the script that is actually generating the output. Fix the issue
by adding the missing dependencies.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:59:26 +09:00
Patrick Steinhardt
9a91ab9400 t/unit-tests: convert "clar-generate.awk" into a shell script
Convert "clar-generate.awk" into a shell script that invokes awk(1).
This allows us to avoid the shell redirect in the build system, which
may otherwise be a problem with build systems on platforms that use a
different shell.

While at it, wrap the overly long lines in the CMake build instructions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-18 09:59:25 +09:00