In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin.
RFC TODO: When preparing this for a full implementation, make sure it is
based on the newest standards introduced by [1].
[1] https://lore.kernel.org/git/xmqqjzfq2f0f.fsf@gitster.g/T/#m606036ea2e75a6d6819d6b5c90e729643b0ff7f7
[PATCH 1/3] builtin: add a repository parameter for builtin functions
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Users may want to enable the --path-walk option for 'git pack-objects' by
default, especially underneath commands like 'git push' or 'git repack'.
This should be limited to client repositories, since the --path-walk option
disables bitmap walks, so would be bad to include in Git servers when
serving fetches and clones. There is potential that it may be helpful to
consider when repacking the repository, to take advantage of improved deltas
across historical versions of the same files.
Much like how "pack.useSparse" was introduced and included in
"feature.experimental" before being enabled by default, use the repository
settings infrastructure to make the new "pack.usePathWalk" config enabled by
"feature.experimental" and "feature.manyFiles".
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.
Add the --path-walk option to the performance tests in p5313.
For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the results are very interesting:
Test this tree
------------------------------------------------------------------
5313.2: thin pack 0.40(0.47+0.04)
5313.3: thin pack size 1.2M
5313.4: thin pack with --full-name-hash 0.09(0.10+0.04)
5313.5: thin pack size with --full-name-hash 22.8K
5313.6: thin pack with --path-walk 0.08(0.06+0.02)
5313.7: thin pack size with --path-walk 20.8K
5313.8: big pack 2.16(8.43+0.23)
5313.9: big pack size 17.7M
5313.10: big pack with --full-name-hash 1.42(3.06+0.21)
5313.11: big pack size with --full-name-hash 18.0M
5313.12: big pack with --path-walk 2.21(8.39+0.24)
5313.13: big pack size with --path-walk 17.8M
5313.14: repack 98.05(662.37+2.64)
5313.15: repack size 449.1K
5313.16: repack with --full-name-hash 33.95(129.44+2.63)
5313.17: repack size with --full-name-hash 182.9K
5313.18: repack with --path-walk 106.21(121.58+0.82)
5313.19: repack size with --path-walk 159.6K
[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08
This repo suffers from having a lot of paths that collide in the name
hash, so examining them in groups by path leads to better deltas. Also,
in this case, the single-threaded implementation is competitive with the
full repack. This is saving time diffing files that have significant
differences from each other.
A similar, but private, repo has even more extremes in the thin packs:
Test this tree
--------------------------------------------------------------
5313.2: thin pack 2.39(2.91+0.10)
5313.3: thin pack size 4.5M
5313.4: thin pack with --full-name-hash 0.29(0.47+0.12)
5313.5: thin pack size with --full-name-hash 15.5K
5313.6: thin pack with --path-walk 0.35(0.31+0.04)
5313.7: thin pack size with --path-walk 14.2K
Notice, however, that while the --full-name-hash version is working
quite well in these cases for the thin pack, it does poorly for some
other standard cases, such as this test on the Linux kernel repository:
Test this tree
--------------------------------------------------------------
5313.2: thin pack 0.01(0.00+0.00)
5313.3: thin pack size 310
5313.4: thin pack with --full-name-hash 0.00(0.00+0.00)
5313.5: thin pack size with --full-name-hash 1.4K
5313.6: thin pack with --path-walk 0.00(0.00+0.00)
5313.7: thin pack size with --path-walk 310
Here, the --full-name-hash option does much worse than the default name
hash, but the path-walk option does exactly as well.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
In order to more easily compute delta bases among objects that appear at the
exact same path, add a --path-walk option to 'git pack-objects'.
This option will use the path-walk API instead of the object walk given by
the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.
The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.
After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.
RFC TODO: It is important to note that this option is inherently
incompatible with using a bitmap index. This walk probably also does not
work with other advanced features, such as delta islands.
Getting ahead of myself, this option compares well with --full-name-hash
when the packfile is large enough, but also performs at least as well as
the default in all cases that I've seen.
RFC TODO: this should probably be recording the batch locations to another
list so they could be processed in a second phase using threads.
RFC TODO: list some examples of how this outperforms previous pack-objects
strategies. (This is coming in later commits that include performance
test changes.)
Signed-off-by: Derrick Stolee <stolee@gmail.com>
This option causes the path-walk API to act like the sparse tree-walk
algorithm implemented by mark_trees_uninteresting_sparse() in
list-objects.c.
Starting from the commits marked as UNINTERESTING, their root trees and
all objects reachable from those trees are UNINTERSTING, at least as we
walk path-by-path. When we reach a path where all objects associated
with that path are marked UNINTERESTING, then do no continue walking the
children of that path.
We need to be careful to pass the UNINTERESTING flag in a deep way on
the UNINTERESTING objects before we start the path-walk, or else the
depth-first search for the path-walk API may accidentally report some
objects as interesting.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
In anticipation of using the path-walk API to analyze tags or include
them in a pack-file, add the ability to walk the tags that were included
in the revision walk.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
We add the ability to filter the object types in the path-walk API so
the callback function is called fewer times.
This adds the ability to ask for the commits in a list, as well. Future
changes will add the ability to visit annotated tags.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
In anticipation of a few planned applications, introduce the most basic form
of a path-walk API. It currently assumes that there are no UNINTERESTING
objects, and does not include any complicated filters. It calls a function
pointer on groups of tree and blob objects as grouped by path. This only
includes objects the first time they are discovered, so an object that
appears at multiple paths will not be included in two batches.
There are many future adaptations that could be made, but they are left for
future updates when consumers are ready to take advantage of those features.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
This also adds the '--full-name-hash' option introduced in the previous
change and adds newlines to the synopsis.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
The pack_name_hash() method has not been materially changed since it was
introduced in ce0bd64299 (pack-objects: improve path grouping
heuristics., 2006-06-05). The intention here is to group objects by path
name, but also attempt to group similar file types together by making
the most-significant digits of the hash be focused on the final
characters.
Here's the crux of the implementation:
/*
* This effectively just creates a sortable number from the
* last sixteen non-whitespace characters. Last characters
* count "most", so things that end in ".c" sort together.
*/
while ((c = *name++) != 0) {
if (isspace(c))
continue;
hash = (hash >> 2) + (c << 24);
}
As the comment mentions, this only cares about the last sixteen
non-whitespace characters. This cause some filenames to collide more
than others. Here are some examples that I've seen while investigating
repositories that are growing more than they should be:
* "/CHANGELOG.json" is 15 characters, and is created by the beachball
[1] tool. Only the final character of the parent directory can
differntiate different versions of this file, but also only the two
most-significant digits. If that character is a letter, then this is
always a collision. Similar issues occur with the similar
"/CHANGELOG.md" path, though there is more opportunity for
differences in the parent directory.
* Localization files frequently have common filenames but differentiate
via parent directories. In C#, the name "/strings.resx.lcl" is used
for these localization files and they will all collide in name-hash.
[1] https://github.com/microsoft/beachball
I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.
It is clear that the existing name-hash algorithm is optimized for
repositories with short path names, but also is optimized for packing a
single snapshot of a repository, not a repository with many versions of
the same file. In my testing, this has proven out where the name-hash
algorithm does a good job of finding peer files as delta bases when
unable to use a historical version of that exact file.
However, for repositories that have many versions of most files and
directories, it is more important that the objects that appear at the
same path are grouped together.
Create a new pack_full_name_hash() method and a new --full-name-hash
option for 'git pack-objects' to call that method instead. Add a simple
pass-through for 'git repack --full-name-hash' for additional testing in
the context of a full repack, where I expect this will be most
effective.
The hash algorithm is as simple as possible to be reasonably effective:
for each character of the path string, add a multiple of that character
and a large prime number (chosen arbitrarily, but intended to be large
relative to the size of a uint32_t). Then, shift the current hash value
to the right by 5, with overlap. The addition and shift parameters are
standard mechanisms for creating hard-to-predict behaviors in the bits
of the resulting hash.
This is not meant to be cryptographic at all, but uniformly distributed
across the possible hash values. This creates a hash that appears
pseudorandom. There is no ability to consider similar file types as
being close to each other.
In a later change, a test-tool will be added so the effectiveness of
this hash can be demonstrated directly.
For now, let's consider how effective this mechanism is when repacking a
repository with and without the --full-name-hash option. Specifically,
let's use 'git repack -adf [--full-name-hash]' as our test.
On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 260 MB | N/A |
| Standard Repack | 127MB | 106s |
| With --full-name-hash | 126 MB | 99s |
This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics with and without --full-name-hash.
However, we can test this in a repository that uses one of the
problematic naming conventions above. The fluentui [2] repo uses
beachball to generate CHANGELOG.json and CHANGELOG.md files, and these
files have very poor delta characteristics when comparing against
versions across parent directories.
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 694 MB | N/A |
| Standard Repack | 438 MB | 728s |
| With --full-name-hash | 168 MB | 142s |
[2] https://github.com/microsoft/fluentui
In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.
Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:
| Repo | Standard Repack | With --full-name-hash |
|----------|-----------------|-----------------------|
| fluentui | 438 MB | 168 MB |
| Repo B | 6,255 MB | 829 MB |
| Repo C | 37,737 MB | 7,125 MB |
| Repo D | 130,049 MB | 6,190 MB |
Future changes could include making --full-name-hash implied by a config
value or even implied by default during a full repack.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
The short help text given by "git show-index -h" says
$ git show-index -h
usage: git show-index [--object-format=<hash-algorithm>]
--[no-]object-format <hash-algorithm>
specify the hash algorithm to use
The command takes a pack .idx file from its standard input. The
user has to _know_ this, as there is no indication from this output.
Give a hint that the data to work on is fed from its standard input.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The preceding two commits introduced special handling of the sideband
channel to neutralize ANSI escape sequences before sending the payload
to the terminal, and `sideband.allowControlCharacters` to override that
behavior.
However, some `pre-receive` hooks that are actively used in practice
want to color their messages and therefore rely on the fact that Git
passes them through to the terminal.
In contrast to other ANSI escape sequences, it is highly unlikely that
coloring sequences can be essential tools in attack vectors that mislead
Git users e.g. by hiding crucial information.
Therefore we can have both: Continue to allow ANSI coloring sequences to
be passed to the terminal, and neutralize all other ANSI escape
sequences.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The preceding commit fixed the vulnerability whereas sideband messages
(that are under the control of the remote server) could contain ANSI
escape sequences that would be sent to the terminal verbatim.
However, this fix may not be desirable under all circumstances, e.g.
when remote servers deliberately add coloring to their messages to
increase their urgency.
To help with those use cases, give users a way to opt-out of the
protections: `sideband.allowControlCharacters`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Correct verb tense, add missing words, avoid double blank lines,
and rephrase things that don’t read well to me like “Turn this linkage
to relative paths”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1bc1e94091 (doc: option value may be separate for valid reasons,
2024-11-25) added a paragraph discussing tilde-expansion of, e.g.,
~/directory/file.
The tilde character has a special meaning to asciidoc tools. In this
particular case, AsciiDoc matches up the two tildes in "e.g.
~/directory/file or ~u/d/f" and sets the text between them using
subscript. In the manpage, where subscripting is not possible, this
renders as "e.g. /directory/file oru/d/f".
These paths are literal values, which our coding guidelines want typeset
as verbatim using backticks. Do that. One effect of this is indeed that
the asciidoc tools stop interpreting tilde and other special characters.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The two-line heading added in 8525e92886 (Document HOME environment
variable, 2024-12-09) uses too many tilde characters, so the heading
isn't detected as such. Both AsciiDoc and Asciidoctor end up
misrendering this in different ways.
Use the correct number of tilde characters to fix this.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure based on meson learned to generate HTML
documention pages.
* ps/build-meson-html:
Documentation: wire up sanity checks for Meson
t/Makefile: make "check-meson" work with Dash
meson: install static files for HTML documentation
meson: generate articles
Documentation: refactor "howto-index.sh" for out-of-tree builds
Documentation: refactor "api-index.sh" for out-of-tree builds
meson: generate user manual
Documentation: inline user-manual.conf
meson: generate HTML pages for all man page categories
meson: fix generation of merge tools
meson: properly wire up dependencies for our docs
meson: wire up support for AsciiDoctor
The developer documentation has been updated to give the latest
info on gitk and git-gui maintainer.
* as/gitk-git-gui-repo-update:
Update the official repo of gitk
Wire up sanity checks for Meson to verify that no man pages are missing.
This check is similar to the same check we already have for our tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we generate man pages, articles and user manual with Meson the
only thing that is still missing in an installation of HTML documents is
a couple of static files. Wire these up to finalize Meson's support for
generating HTML documentation.
Diffing an installation that uses our Makefile with an installation that
uses Meson only surfaces a couple of discepancies now:
- Meson doesn't install "everyday.html" and "git-remote-helpers.html".
These files are marked as obsolete and don't contain any useful
information anymore: they simply point to their modern equivalents.
- Meson doesn't install "*.txt" files when asking for HTML docs. I'm
not sure why our Makefiles do this in the first place, and it does
seem like the resulting installation is fully functional even
without those files.
Other than that, both layout and file contents are the exact same.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the Meson build system already knows to generate man pages and our
user manual, it does not yet generate the random assortment of articles
that we have. Plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "howto-index.sh" is used to generate an index of our how-to docs. It
receives as input the paths to these documents, which would typically be
relative to the "Documentation/" directory in Makefile-based builds. In
an out-of-tree build though it will get relative that may be rooted
somewhere else entirely.
The file paths do end up in the generated index, and the expectation is
that they should always start with "howto/". But for out-of-tree builds
we would populate it with the paths relative to the build directory,
which is wrong.
Fix the issue by using `$(basename "$file")` to generate the path. While
at it, move the script into "howto/" to align it with the location of
the comparable "api-index.sh" script.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "api-index.sh" script generates an index of API-related
documentation. The script does not handle out-of-tree builds and thus
cannot be used easily by Meson.
Refactor it to be independent of locations by both accepting a source
directory where the API docs live as well as a path to an output file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation contains a user manual that gives people a short
introduction to Git. Our Makefile knows to generate the manual into
three different formats: an HTML page, a PDF and an info page. The Meson
build instructions don't yet generate any of these.
While wiring up all these formats I hit a couple of road blocks with how
we generate our info pages. Even though I eventually resolved these, it
made me question whether anybody actually uses info pages in the first
place. Checking through a couple of downstream consumers I couldn't find
a single user of either the info pages nor of our PDF manual in Arch
Linux, Debian, Fedora, Ubuntu, FreeBSD or OpenBSDFedora. So it's rather
safe to assume that there aren't really any users out there, and thus
the added complexity does not seem worth it.
Wire up support for building the user manual in HTML format and
conciously skip over the other two formats. This is basically a form of
silent deprecation: if people out there use the other two formats they
will eventually complain about them missing in Meson, which means we can
wire them up at a later point. If they don't we can phase out these
formats eventually.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating our user manual we set up a bit of extra configuration
compared to our normal configuration. This is done by having an extra
"user-manual.conf" file that Asciidoc seems to pull in automatically due
to matching filenames with "user-manual.txt". This dependency is quite
hidden though and thus easy to miss. Furthermore, it seems that Asciidoc
does not know to pull it in for out-of-tree builds where we use relative
paths.
The setup in AsciiDoctor is somewhat different: instead of having two
sets of configuration, we condition the use of manual-specific configs
based on whether the document type is "book". And as we only build our
user manual with that type this is sufficient.
Use the same trick for our user manual by inlining the configuration
into "asciidoc.conf.in" and making it conditional on whether or not
"doctype-book" is defined.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating HTML pages for our man pages we only generate them for
category 1 in Meson, which are the pages corresponding to our built-in
commands. I cannot tell why I added this filter though: our Makefile
installs all man pages, so a Meson-based build misses out on many of
them.
Fix this by removing the filter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our buildsystems generate a list of diff and merge tools that ultimately
end up in our documentation. And while Meson does wire up the logic, it
tries to use the TOOL_MODE environment variable to set up the mode. This
is wrong though: the mode is set via an argument that we have fixed to
'diff' mode by accident.
Fix this such that merge tools are properly generated.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of Meson documentation targets use `meson.current_source_dir()`
to resolve inputs. This has the downside that it does not automagically
make Meson track these inputs as a dependency. After all, string
arguments really can be anything, even if they happen to match an actual
filesystem path.
Adapt these build targets to instead use inputs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While our Makefile supports both Asciidoc and AsciiDoctor, our Meson
build instructions only support the former. Wire up support for the
latter, as well.
Our Makefile always favors Asciidoc, but Meson will automatically figure
out which of both to use based on whether they are installed or not. To
keep compatibility with our Makefile it favors Asciidoc over Asciidoctor
in case both are available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Point out:
- current maintaner
- contribution flow is via the mailing list
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's wait for git-gui, gitk, and possibly po/ and delay the tagging
of the -rc1. Many people are already offline for the end-of-year
holidays and it is a slow week, and 'master' front has too many new
things graduated from 'next' a bit too early for me to feel
comfortable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git refs migrate" learned to also migrate the reflog data across
backends.
* kn/reflog-migration:
refs: mark invalid refname message for translation
refs: add support for migrating reflogs
refs: allow multiple reflog entries for the same refname
refs: introduce the `ref_transaction_update_reflog` function
refs: add `committer_info` to `ref_transaction_add_update()`
refs: extract out refname verification in transactions
refs/files: add count field to ref_lock
refs: add `index` field to `struct ref_udpate`
refs: include committer info in `ref_update` struct
A topic to optionally build with meson, which has graduated to
'master' recently, broke Documentation pipeline with asciidoctor
for the normal Makefile build as well as meson-based one, which
have been corrected.
* ma/asciidoctor-build-fixes:
asciidoctor-extensions.rb.in: inject GIT_DATE
asciidoctor-extensions.rb.in: add missing word
asciidoctor-extensions.rb.in: delete existing <refmiscinfo/>
A topic to optionally build with meson, which has graduated to
'master' recently, has regressed the normal Makefile build, which
is being corrected.
* ps/build-hotfix:
meson: add options to override build information
GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
GIT-VERSION-GEN: fix overriding GIT_VERSION
Makefile: introduce template for GIT-VERSION-GEN
Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
Makefile: stop including "GIT-VERSION-FILE" in docs
"git range-diff" learned to optionally show and compare merge
commits in the ranges being compared, with the --diff-merges
option.
* js/range-diff-diff-merges:
range-diff: introduce the convenience option `--remerge-diff`
range-diff: optionally include merge commits' diffs in the analysis
After a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06), we no longer inject GIT_DATE when building with
Asciidoctor.
Replace the <date/> tag in the XML to inject the value of GIT_DATE.
Unlike <refmiscinfo/> as handled in a recent commit, we have no reason
to expect that this tag might be missing, so there's no need for "maybe
remove, then add" and we can just outright replace the one that
Asciidoctor has generated based on the mtime of the source file.
Compared to pre-a38edab7c8, we now end up injecting this also in the
build of Git.3pm, which until now has been using the mtime of Git.pm.
That is arguably even a good change since it results in more
reproducible builds.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06) stopped providing an attribute value "Git $(GIT_VERSION)" to
asciidoc/Asciidoctor over the command line. Instead, we now provide the
attribute to asciidoc through a generated asciidoc.conf, where the value
is generated as "Git @GIT_VERSION@".
In the similar mechanism for Asciidoctor, we forgot the "Git" prefix.
Restore it.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After the recent a38edab7c8 (Makefile: generate doc versions via
GIT-VERSION-GEN, 2024-12-06), building with Asciidoctor results in
manpages where the headers no longer contain "Git Manual" and the
footers no longer identify the built Git version.
Before a38edab7c8, we used to just provide a few attributes to
Asciidoctor (and asciidoc). Commit 7a30134358 (asciidoctor-extensions:
provide `<refmiscinfo/>`, 2019-09-16) noted that older versions of
Asciidoctor didn't propagate those attributes into the built XML files,
so we started injecting them ourselves from this script. With newer
versions of Asciidoctor, we'd end up with some harmless duplication
among the tags in the final XML.
Post-a38edab7c8, we don't provide these attributes and Asciidoctor
inserts empty-ish values. After our additions from 7a30134358, we get
<refmiscinfo class="source"> </refmiscinfo>
<refmiscinfo class="manual"> </refmiscinfo>
<refmiscinfo class="source">2.47.1.[...]</refmiscinfo>
<refmiscinfo class="manual">Git Manual</refmiscinfo>
When these are handled, it appears to be first come first served,
meaning that our additions have no effect and we regress as described in
the first paragraph.
Remove existing "source" or "manual" <refmiscinfo/> tags before adding
ours. I considered removing all <refmiscinfo/> to get a nice clean
slate, instead of just those two that we want to replace to be a bit
more precise. I opted for the latter. Maybe one day, Asciidoctor learns
to insert something useful there which `xmlto` can pick up and make good
use of -- let's not interfere.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/build-hotfix:
meson: add options to override build information
GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
GIT-VERSION-GEN: fix overriding GIT_VERSION
Makefile: introduce template for GIT-VERSION-GEN
Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
Makefile: stop including "GIT-VERSION-FILE" in docs
We inject various different kinds of build information into build
artifacts, like the version string or the commit from which Git was
built. Add options to let users explicitly override this information
with Meson.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GIT-VERSION-GEN tries to derive the version that Git is being built from
via multiple different sources in the following order:
1. A file called "version" in the source tree's root directory, if it
exists.
2. The current commit in case Git is built from a Git repository.
3. Otherwise, we use a fallback version stored in a variable which is
bumped whenever a new Git version is getting tagged.
It used to be possible to override the version by overriding the
`GIT_VERSION` Makefile variable (e.g. `make GIT_VERSION=foo`). This
worked somewhat by chance, only: `GIT-VERSION-GEN` would write the
actual Git version into `GIT-VERSION-FILE`, not the overridden value,
but when including the file into our Makefile we would not override the
`GIT_VERSION` variable because it has already been set by the user. And
because our Makefile used the variable to propagate the version to our
build tools instead of using `GIT-VERSION-FILE` the resulting build
artifacts used the overridden version.
But that subtle mechanism broke with 4838deab65 (Makefile: refactor
GIT-VERSION-GEN to be reusable, 2024-12-06) and subsequent commits
because the version information is not propagated via the Makefile
variable anymore, but instead via the files that `GIT-VERSION-GEN`
started to write. And as the script never knew about the `GIT_VERSION`
environment variable in the first place it uses one of the values listed
above instead of the overridden value.
Fix this issue by making `GIT-VERSION-GEN` handle the case where
`GIT_VERSION` has been set via the environment.
Note that this requires us to introduce a new GIT_VERSION_OVERRIDE
variable that stores a potential user-provided value, either via the
environment or via "config.mak". Ideally we wouldn't need it and could
just continue to use GIT_VERSION for this. But unfortunately, Makefiles
will first include all sub-Makefiles before figuring out whether it
needs to re-make any of them [1]. Consequently, if there already is a
GIT-VERSION-FILE, we would have slurped in its value of GIT_VERSION
before we call GIT-VERSION-GEN, and because GIT-VERSION-GEN now uses
that value as an override it would mean that the first generated value
for GIT_VERSION will remain unchanged.
Furthermore we have to move the include for "GIT-VERSION-FILE" after the
includes for "config.mak" and related so that GIT_VERSION_OVERRIDE can
be set to the value provided by "config.mak".
[1]: https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new template to call GIT-VERSION-GEN. This will allow us to
iterate on how exactly the script is called in subsequent commits
without having to adapt all call sites every time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>