The `timestamp_t` type is declared as `uintmax_t` and thus typically has
64 bits of precision. Usually, the full precision of such dates is not
required: it would be comforting to know that Git is still around in
millions of years, but all in all the chance is rather low.
We abuse this fact in the commit-graph: instead of storing the full 64
bits of precision, committer dates only store 34 bits. This is still
plenty of headroom, as it means that we can represent dates until year
2514. Commits which are dated beyond that year will simply get a date
whose remaining bits are masked.
The result of this is somewhat curious: the committer date will be
different depending on whether a commit gets parsed via the commit-graph
or via the object database. This isn't really too much of an issue in
general though, as we don't typically use the date parsed from the
commit-graph in user-facing output.
But with 024b4c9697 (commit: make `repo_parse_commit_no_graph()` more
robust, 2026-02-16) it started to become a problem when writing the
commit-graph itself. This commit changed `repo_parse_commit_no_graph()`
so that we re-parse the commit via the object database in case it was
already parsed beforehand via the commit-graph.
The consequence is that we may now act with two different commit dates
at different stages:
- Initially, we use the 34-bit precision timestamp when writing the
chunk generation data. We thus correctly compute the offsets
relative to the on-disk timestamp here.
- Later, when writing the overflow data, we may end up with the
full-precision timestamp. When the date is larger than 34 bits the
result of this is an underflow when computing the offset.
This causes a mismatch in the number of generation data overflow records
we want to write, and that ultimately causes Git to die.
Introduce a new helper function that computes the generation offset for
a commit while correctly masking the date to 34 bits. This makes the
previously-implicit assumptions about the commit date precision explicit
and thus hopefully less fragile going forward.
Adapt sites that compute the offset to use the function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we refactored `lookup_commit_reference_gently()`
so that it doesn't parse non-commit objects anymore. This has led to a
speedup when git-receive-pack(1) accepts a shallow push into a repo
with lots of refs that point to blobs or trees.
But while this case is now faster, we still have the issue that
accepting pushes with lots of "normal" refs that point to commits are
still slow. This is mostly because we look up the commits via the object
database, and that is rather costly.
Adapt the code to use `repo_parse_commit_gently()` instead of
`parse_object()` to parse the resulting commit object. This function
knows to use the commit-graph to fill in the object, which is way more
cost efficient.
This leads to another significant speedup when accepting shallow pushes.
The following benchmark pushes a single objects from a shallow clone
into a repository with 600,000 references that all point to commits:
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 9.179 s ± 0.031 s [User: 8.858 s, System: 0.528 s]
Range (min … max): 9.154 s … 9.213 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 2.337 s ± 0.032 s [User: 2.331 s, System: 0.234 s]
Range (min … max): 2.308 s … 2.371 s 3 runs
Summary
git-receive-pack . </tmp/input (rev = HEAD) ran
3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~)
Also, this again leads to a significant reduction in memory allocations.
Before this change:
HEAP SUMMARY:
in use at exit: 17,524,978 bytes in 22,393 blocks
total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
And after this change:
HEAP SUMMARY:
in use at exit: 11,534,036 bytes in 12,406 blocks
total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we will start to parse more commits via the
commit-graph. This change will lead to a segfault though because we try
to access the tree of a commit via `repo_get_commit_tree()`, but:
- The commit has been parsed via the commit-graph, and thus its
`maybe_tree` field is not yet populated.
- We cannot use the commit-graph to populate the commit's tree because
we're in the process of writing the commit-graph.
The consequence is that we'll get a `NULL` pointer for the tree in
`write_graph_chunk_data()`.
In theory we are already mindful of this situation, as we explicitly use
`repo_parse_commit_no_graph()` to parse the commit without the help of
the commit-graph. But that doesn't do the trick as the commit is already
marked as parsed, so the function will not re-populate it. And as the
commit-graph has been closed, neither will `get_commit_tree_oid()` be
able to load the tree for us.
It seems like this issue can only be hit under artificial circumstances:
the error was hit via `git_test_write_commit_graph_or_die()`, which is
run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:
$ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
--test-args=-ix -i
...
++ git -c commit.verbose=true commit --amend
hint: Waiting for your editor to close the file...
./test-lib.sh: line 1012: 55895 Segmentation fault (core dumped) git -c commit.verbose=true commit --amend
To the best of my knowledge, this is the only case where we end up
writing a commit-graph in the same process that might have already
consulted the commit-graph to look up arbitrary objects. But regardless
of that, this feels like a bigger accident that is just waiting to
happen.
Make the code more robust by extending `repo_parse_commit_no_graph()` to
unparse a commit first in case we detect it's coming from a graph. This
ensures that we will re-read the object without it, and thus we will
populate `maybe_tree` properly.
This fix shouldn't have any performance consequences: the function is
only ever called in the "commit-graph.c" code, and we'll only re-parse
the commit at most once.
Add an exclusion to our Coccinelle rules so that it doesn't complain
about us accessing `maybe_tree` directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `lookup_commit_reference_gently()` can be used to look up a
committish by object ID. As such, the function knows to peel for example
tag objects so that we eventually end up with the commit.
The function is used quite a lot throughout our tree. One such user is
"shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
function is to figure out whether a shallow push is missing any objects
that are required to satisfy the ref updates, and if so, which of the
ref updates is missing objects.
This is done by painting the tree with `UNINTERESTING`. We start
painting by calling `refs_for_each_ref()` so that we can mark all
existing referenced objects as the boundary of objects that we already
have, and which are supposed to be fully connected. The reference tips
are then parsed via `lookup_commit_reference_gently()`, and the commit
is then marked as uninteresting.
But references may not necessarily point to a committish, and if a lot
of them aren't then this step takes a lot of time. This is mostly due to
the way that `lookup_commit_reference_gently()` is implemented: before
we learn about the type of the object we already call `parse_object()`
on the object ID. This has two consequences:
- We parse all objects, including trees and blobs, even though we
don't even need the contents of them.
- More importantly though, `parse_object()` will cause us to check
whether the object ID matches its contents.
Combined this means that we deflate and hash every non-committish
object, and that of course ends up being both CPU- and memory-intensive.
Improve the logic so that we first use `peel_object()`. This function
won't parse the object for us, and thus it allows us to learn about the
object's type before we parse and return it.
The following benchmark pushes a single object from a shallow clone into
a repository that has 100,000 refs. These refs were created by listing
all objects via `git rev-list(1) --objects --all` and creating refs for
a subset of them, so lots of those refs will cover non-commit objects.
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s]
Range (min … max): 62.191 s … 63.010 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s]
Range (min … max): 38.176 s … 38.551 s 3 runs
Summary
git-receive-pack . </tmp/input (rev = HEAD) ran
1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)
This leads to a sizeable speedup as we now skip reading and parsing
non-commit objects. Before this change we spent around 40% of the time
in `assign_shallow_commits_to_refs()`, after the change we only spend
around 1.2% of the time in there. Almost the entire remainder of the
time is spent in git-rev-list(1) to perform the connectivity checks.
Despite the speedup though, this also leads to a massive reduction in
allocations. Before:
HEAP SUMMARY:
in use at exit: 352,480,441 bytes in 97,185 blocks
total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated
And after:
HEAP SUMMARY:
in use at exit: 17,524,978 bytes in 22,393 blocks
total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated
Note that when all references refer to commits performance stays roughly
the same, as expected. The following benchmark was executed with 600k
commits:
Benchmark 1: git-receive-pack (rev = HEAD~)
Time (mean ± σ): 9.101 s ± 0.006 s [User: 8.800 s, System: 0.520 s]
Range (min … max): 9.095 s … 9.106 s 3 runs
Benchmark 2: git-receive-pack (rev = HEAD)
Time (mean ± σ): 9.128 s ± 0.094 s [User: 8.820 s, System: 0.522 s]
Range (min … max): 9.019 s … 9.188 s 3 runs
Summary
git-receive-pack (rev = HEAD~) ran
1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)
This will be improved in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'jx/zh_CN' of github.com:jiangxin/git:
l10n: zh_CN: standardize glossary terms
l10n: zh_CN: updated translation for 2.53
l10n: zh_CN: fix inconsistent use of standard vs. wide colons
Add preferred Chinese terminology notes and align existing translations
to the updated glossary. AI-assisted review was used to check and
improve legacy translations.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Replace mixed usage of standard (ASCII) colons ':' with full-width
(wide) colons ':' in Chinese translations to ensure typographic
consistency, as reported by CAESIUS-TIM [1].
Full-width punctuation is preferred in Chinese localization for better
readability and adherence to typesetting conventions.
[1]: https://github.com/git-l10n/git-po/issues/884
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
* 'master' of https://github.com/j6t/git-gui:
git-gui: mark *.po files at any directory level as UTF-8
git-gui i18n: Update Bulgarian translation (558t)
git-gui i18n: Update Bulgarian translation (557t)
When a commit is viewed in Gitk that changes a file in po/glossary, the
patch text shows mojibake instead of correctly decoded UTF-8 text.
Gitk retrieves the encoding attribute to decide how to treat the bytes
that make up the patch text. There is an attribute definition that all
files are US-ASCII, and a later attribute definition overrides this.
But the override, which specifies UTF-8, applies only to *.po files in
directory po/ and does not apply to subdirectories.
Widen the pattern to apply to all directory levels.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
- Translate new string (558t)
- Add graves for disambiguation
- Improve glossary translation (96t) and synchonize with git
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Upstream symbolic link support on Windows from Git-for-Windows.
* js/symlink-windows:
mingw: special-case index entries for symlinks with buggy size
mingw: emulate `stat()` a little more faithfully
mingw: try to create symlinks without elevated permissions
mingw: add support for symlinks to directories
mingw: implement basic `symlink()` functionality (file symlinks only)
mingw: implement `readlink()`
mingw: allow `mingw_chdir()` to change to symlink-resolved directories
mingw: support renaming symlinks
mingw: handle symlinks to directories in `mingw_unlink()`
mingw: add symlink-specific error codes
mingw: change default of `core.symlinks` to false
mingw: factor out the retry logic
mingw: compute the correct size for symlinks in `mingw_lstat()`
mingw: teach dirent about symlinks
mingw: let `mingw_lstat()` error early upon problems with reparse points
mingw: drop the separate `do_lstat()` function
mingw: implement `stat()` with symlink support
mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`
Dscho observed that SVN tests are taking too much time in CI leak
checking tasks, but most time is spent not in our code but in libsvn
code (which happen to be written in Perl), whose leaks have little
value to discover for us. Skip SVN, P4, and CVS tests in the leak
checking tasks.
* js/ci-leak-skip-svn:
ci: skip CVS and P4 tests in leaks job, too
ci(*-leaks): skip the git-svn tests to save time
"git bugreport" and "git version --build-options" learned to
include use of 'gettext' feature, to make it easier to diagnose
problems around l10n.
* jx/build-options-gettext:
help: report on whether or not gettext is enabled
Remove implicit reliance on the_repository global in the APIs
around tree objects and make it explicit which repository to work
in.
* rs/tree-wo-the-repository:
cocci: remove obsolete the_repository rules
cocci: convert parse_tree functions to repo_ variants
tree: stop using the_repository
tree: use repo_parse_tree()
path-walk: use repo_parse_tree_gently()
pack-bitmap-write: use repo_parse_tree()
delta-islands: use repo_parse_tree()
bloom: use repo_parse_tree()
add-interactive: use repo_parse_tree_indirect()
tree: add repo_parse_tree*()
environment: move access to core.maxTreeDepth into repo settings
The logic that avoids reusing MIDX files with a wrong checksum was
broken, which has been corrected.
* tb/midx-write-corrupt-checksum-fix:
midx-write.c: assume checksum-invalid MIDXs require an update
t/t5319-multi-pack-index.sh: drop early 'test_done'