Commit Graph

878 Commits

Author SHA1 Message Date
Johannes Schindelin
22d90275f1 Introduce helper to create symlinks that knows about index_state
On Windows, symbolic links actually have a type depending on the target:
it can be a file or a directory.

In certain circumstances, this poses problems, e.g. when a symbolic link
is supposed to point into a submodule that is not checked out, so there
is no way for Git to auto-detect the type.

To help with that, we will add support over the course of the next
commits to specify that symlink type via the Git attributes. This
requires an index_state, though, something that Git for Windows'
`symlink()` replacement cannot know about because the function signature
is defined by the POSIX standard and not ours to change.

So let's introduce a helper function to create symbolic links that
*does* know about the index_state.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-04-11 06:40:01 +00:00
Johannes Schindelin
4fbbb08690 reftable: do make sure to use custom allocators
The reftable library goes out of its way to use its own set of allocator
functions that can be configured using `reftable_set_alloc()`. However,
Git does not configure this.

That is not typically a problem, except when Git uses a custom allocator
via some definitions in `git-compat-util.h`, as is the case in Git for
Windows (which switched away from the long-unmaintained nedmalloc to
mimalloc).

Then, it is quite possible that Git assigns a `strbuf` (allocated via
the custom allocator) to, say, the `refname` field of a
`reftable_log_record` in `write_transaction_table()`, and later on asks
the reftable library function `reftable_log_record_release()` to release
it, but that function was compiled without using `git-compat-util.h` and
hence calls regular `free()` (i.e. _not_ the custom allocator's own
function).

This has been a problem for a long time and it was a matter of some sort
of "luck" that 1) reftables are not commonly used on Windows, and 2)
mimalloc can often ignore gracefully when it is asked to release memory
that it has not allocated.

However, a recent update to `seen` brought this problem to the
forefront, letting t1460 fail in Git for Windows, with symptoms much in
the same way as the problem I had to address in d02c37c3e6
(t-reftable-basics: allow for `malloc` to be `#define`d, 2025-01-08)
where exit code 127 was also produced in lieu of
`STATUS_HEAP_CORRUPTION` (C0000374) because exit codes are only 7 bits
wide.

It was not possible to figure out what change in particular caused these
new failures within a reasonable time frame, as there are too many
changes in `seen` that conflict with Git for Windows' patches, I had to
stop the investigation after spending four hours on it fruitlessly.

To verify that this patch fixes the issue, I avoided using mimalloc and
temporarily patched in a "custom allocator" that would more reliably
point out problems, like this:

  diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
  index 68f38291f84c..9421d630b9f5 100644
  --- a/refs/reftable-backend.c
  +++ b/refs/reftable-backend.c
  @@ -353,6 +353,69 @@ static int reftable_be_fsync(int fd)
   	return fsync_component(FSYNC_COMPONENT_REFERENCE, fd);
   }

  +#define DEBUG_REFTABLE_ALLOC
  +#ifdef DEBUG_REFTABLE_ALLOC
  +#include "khash.h"
  +
  +static inline khint_t __ac_X31_hash_ptr(void *ptr)
  +{
  +	union {
  +		void *ptr;
  +		char s[sizeof(void *)];
  +	} u;
  +	size_t i;
  +	khint_t h;
  +
  +	u.ptr = ptr;
  +	h = (khint_t)*u.s;
  +	for (i = 0; i < sizeof(void *); i++)
  +		h = (h << 5) - h + (khint_t)u.s[i];
  +	return h;
  +}
  +
  +#define kh_ptr_hash_func(key) __ac_X31_hash_ptr(key)
  +#define kh_ptr_hash_equal(a, b) ((a) == (b))
  +
  +KHASH_INIT(ptr, void *, int, 0, kh_ptr_hash_func, kh_ptr_hash_equal)
  +
  +static kh_ptr_t *my_malloced;
  +
  +static void *my_malloc(size_t sz)
  +{
  +	int dummy;
  +	void *ptr = malloc(sz);
  +	if (ptr)
  +		kh_put_ptr(my_malloced, ptr, &dummy);
  +	return ptr;
  +}
  +
  +static void *my_realloc(void *ptr, size_t sz)
  +{
  +	int dummy;
  +	if (ptr) {
  +		khiter_t pos = kh_get_ptr(my_malloced, ptr);
  +		if (pos >= kh_end(my_malloced))
  +			die("Was not my_malloc()ed: %p", ptr);
  +		kh_del_ptr(my_malloced, pos);
  +	}
  +	ptr = realloc(ptr, sz);
  +	if (ptr)
  +		kh_put_ptr(my_malloced, ptr, &dummy);
  +	return ptr;
  +}
  +
  +static void my_free(void *ptr)
  +{
  +	if (ptr) {
  +		khiter_t pos = kh_get_ptr(my_malloced, ptr);
  +		if (pos >= kh_end(my_malloced))
  +			die("Was not my_malloc()ed: %p", ptr);
  +		kh_del_ptr(my_malloced, pos);
  +	}
  +	free(ptr);
  +}
  +#endif
  +
   static struct ref_store *reftable_be_init(struct repository *repo,
   					  const char *gitdir,
   					  unsigned int store_flags)
  @@ -362,6 +425,11 @@ static struct ref_store *reftable_be_init(struct repository *repo,
   	int is_worktree;
   	mode_t mask;

  +#ifdef DEBUG_REFTABLE_ALLOC
  +	my_malloced = kh_init_ptr();
  +	reftable_set_alloc(my_malloc, my_realloc, my_free);
  +#endif
  +
   	mask = umask(0);
   	umask(mask);

I briefly considered contributing this "custom allocator" patch, too,
but it is unwieldy (for example, it would not work at all when compiling
with mimalloc support) and it would only waste space (or even time, if a
compile flag was introduced and exercised as part of the CI builds).
Given that it is highly unlikely that Git will lose the new
`reftable_set_alloc()` call by mistake, I rejected that idea as simply
too wasteful.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-04-11 06:39:58 +00:00
Junio C Hamano
bb1d626802 Merge branch 'sp/refs-reduce-the-repository' into next
Code clean-up to use the right instance of a repository instance in
calls inside refs subsystem.

* sp/refs-reduce-the-repository:
  refs/reftable-backend: drop uses of the_repository
  refs: remove the_hash_algo global state
  refs: add struct repository parameter in get_files_ref_lock_timeout_ms()
2026-04-09 11:24:04 -07:00
Shreyansh Paliwal
57c590feb9 refs/reftable-backend: drop uses of the_repository
reftable_be_init() and reftable_be_create_on_disk() use the_repository even
though a repository instance is already available, either directly or via
struct ref_store.

Replace these uses with the appropriate local repository instance (repo or
ref_store->repo) to avoid relying on global state.

Note that USE_THE_REPOSITORY_VARIABLE cannot be removed yet, as
is_bare_repository() is still there in the file.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-04-08 09:58:10 -07:00
Shreyansh Paliwal
b886f0b5dc refs: add struct repository parameter in get_files_ref_lock_timeout_ms()
get_files_ref_lock_timeout_ms() calls repo_config_get_int() using
the_repository, as no repository instance is available in its scope. Add a
struct repository parameter and use it instead of the_repository.

Update all callers accordingly. In files-backend.c, lock_raw_ref() can
obtain repository instance from the struct ref_transaction via
transaction->ref_store->repo and pass it down. For create_reflock(), which
is used as a callback, introduce a small wrapper struct to pass both struct
lock_file and struct repository through the callback data.

This reduces reliance on the_repository global, though the function
still uses static variables and is not yet fully repository-scoped.
This can be addressed in a follow-up change.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-04-08 09:58:09 -07:00
Junio C Hamano
04adce37b9 Merge branch 'ps/reftable-portability' into next
Update reftable library part with what is used in libgit2 to improve
portability to different target codebases and platforms.

* ps/reftable-portability:
  reftable/system: add abstraction to mmap files
  reftable/system: add abstraction to retrieve time in milliseconds
  reftable/fsck: use REFTABLE_UNUSED instead of UNUSED
  reftable/stack: provide fsync(3p) via system header
  reftable: introduce "reftable-system.h" header
2026-04-02 13:30:45 -07:00
Junio C Hamano
b060d988f4 Merge branch 'jk/c23-const-preserving-fixes-more' into next
Further work to adjust the codebase for C23 that changes functions
like strchr() that discarded constness when they return a pointer into
a const string to preserve constness.

* jk/c23-const-preserving-fixes-more:
  refs/files-backend: drop const to fix strchr() warning
  http: drop const to fix strstr() warning
  range-diff: drop const to fix strstr() warnings
  pkt-line: make packet_reader.line non-const
  skip_prefix(): check const match between in and out params
  pseudo-merge: fix disk reads from find_pseudo_merge()
  find_last_dir_sep(): convert inline function to macro
  run-command: explicitly cast away constness when assigning to void
  pager: explicitly cast away strchr() constness
  transport-helper: drop const to fix strchr() warnings
  http: add const to fix strchr() warnings
  convert: add const to fix strchr() warnings
2026-04-02 13:30:43 -07:00
Patrick Steinhardt
b45ea595e6 reftable/stack: provide fsync(3p) via system header
Users of the reftable library are expected to provide their own function
callback in cases they want to sync(3p) data to disk via the reftable
write options. But if no such function was provided we end up calling
fsync(3p) directly, which may not even be available on some systems.

While dropping the explicit call to fsync(3p) would work, it would lead
to an unsafe default behaviour where a project may have forgotten to set
up the callback function, and that could lead to potential data loss. So
this is not a great solution.

Instead, drop the callback function and make it mandatory for the
project to define fsync(3p). In the case of Git, we can then easily
inject our custom implementation via the "reftable-system.h" header so
that we continue to use `fsync_component()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-04-02 10:45:43 -07:00
Jeff King
f1b8a4d108 refs/files-backend: drop const to fix strchr() warning
In show_one_reflog_ent(), we're fed a writable strbuf buffer, which we
parse into the various reflog components. We write a NUL over email_end
to tie off one of the fields, and thus email_end must be non-const.

But with a C23 implementation of libc, strchr() will now complain when
assigning the result to a non-const pointer from a const one. So we can
fix this by making the source pointer non-const.

But there's a catch. We derive that source pointer by parsing the line
with parse_oid_hex_algop(), which requires a const pointer for its
out-parameter. We can work around that by teaching it to use our
CONST_OUTPARAM() trick, just like skip_prefix().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-04-01 22:08:53 -07:00
Junio C Hamano
0a39ec283c Merge branch 'vp/http-rate-limit-retries'
The HTTP transport learned to react to "429 Too Many Requests".

* vp/http-rate-limit-retries:
  http: add support for HTTP 429 rate limit retries
  strbuf_attach: fix call sites to pass correct alloc
  strbuf: pass correct alloc to strbuf_attach() in strbuf_reencode()
2026-04-01 10:28:18 -07:00
Junio C Hamano
9cb4ec5fad Merge branch 'vp/http-rate-limit-retries' into next
The HTTP transport learned to react to "429 Too Many Requests".

* vp/http-rate-limit-retries:
  http: add support for HTTP 429 rate limit retries
  strbuf_attach: fix call sites to pass correct alloc
  strbuf: pass correct alloc to strbuf_attach() in strbuf_reencode()
2026-03-24 14:24:06 -07:00
Vaidas Pilkauskas
a4fddb01c5 strbuf_attach: fix call sites to pass correct alloc
strbuf_attach(sb, buf, len, alloc) requires alloc > len (the buffer
must have at least len+1 bytes to hold the NUL). Several call sites
passed alloc == len, relying on strbuf_grow(sb, 0) inside strbuf_attach
to reallocate. Fix these in mailinfo, am, refs/files-backend,
fast-import, and trailer by passing len+1 when the buffer is a
NUL-terminated string (or from strbuf_detach).

Signed-off-by: Vaidas Pilkauskas <vaidas.pilkauskas@shopify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-17 09:14:19 -07:00
Junio C Hamano
d445aecfb0 Merge branch 'ps/refs-for-each'
Code refactoring around refs-for-each-* API functions.

* ps/refs-for-each:
  refs: replace `refs_for_each_fullref_in()`
  refs: replace `refs_for_each_namespaced_ref()`
  refs: replace `refs_for_each_glob_ref()`
  refs: replace `refs_for_each_glob_ref_in()`
  refs: replace `refs_for_each_rawref_in()`
  refs: replace `refs_for_each_rawref()`
  refs: replace `refs_for_each_ref_in()`
  refs: improve verification for-each-ref options
  refs: generalize `refs_for_each_fullref_in_prefixes()`
  refs: generalize `refs_for_each_namespaced_ref()`
  refs: speed up `refs_for_each_glob_ref_in()`
  refs: introduce `refs_for_each_ref_ext`
  refs: rename `each_ref_fn`
  refs: rename `do_for_each_ref_flags`
  refs: move `do_for_each_ref_flags` further up
  refs: move `refs_head_ref_namespaced()`
  refs: remove unused `refs_for_each_include_root_ref()`
2026-03-09 14:36:55 -07:00
Junio C Hamano
1d0a2acb78 Merge branch 'kn/ref-location'
Allow the directory in which reference backends store their data to
be specified.

* kn/ref-location:
  refs: add GIT_REFERENCE_BACKEND to specify reference backend
  refs: allow reference location in refstorage config
  refs: receive and use the reference storage payload
  refs: move out stub modification to generic layer
  refs: extract out `refs_create_refdir_stubs()`
  setup: don't modify repo in `create_reference_database()`
2026-03-04 10:52:59 -08:00
Junio C Hamano
ec9e0a36ff Merge branch 'ps/refs-for-each' into next
Code refactoring around refs-for-each-* API functions.

* ps/refs-for-each:
  refs: replace `refs_for_each_fullref_in()`
  refs: replace `refs_for_each_namespaced_ref()`
  refs: replace `refs_for_each_glob_ref()`
  refs: replace `refs_for_each_glob_ref_in()`
  refs: replace `refs_for_each_rawref_in()`
  refs: replace `refs_for_each_rawref()`
  refs: replace `refs_for_each_ref_in()`
  refs: improve verification for-each-ref options
  refs: generalize `refs_for_each_fullref_in_prefixes()`
  refs: generalize `refs_for_each_namespaced_ref()`
  refs: speed up `refs_for_each_glob_ref_in()`
  refs: introduce `refs_for_each_ref_ext`
  refs: rename `each_ref_fn`
  refs: rename `do_for_each_ref_flags`
  refs: move `do_for_each_ref_flags` further up
  refs: move `refs_head_ref_namespaced()`
  refs: remove unused `refs_for_each_include_root_ref()`
2026-02-27 15:16:31 -08:00
Junio C Hamano
e87adbdb69 Merge branch 'kn/ref-location' into next
Allow the directory in which reference backends store their data to
be specified.

* kn/ref-location:
  refs: add GIT_REFERENCE_BACKEND to specify reference backend
  refs: allow reference location in refstorage config
  refs: receive and use the reference storage payload
  refs: move out stub modification to generic layer
  refs: extract out `refs_create_refdir_stubs()`
  setup: don't modify repo in `create_reference_database()`
2026-02-26 10:06:14 -08:00
Karthik Nayak
d74aacd7c4 refs: receive and use the reference storage payload
An upcoming commit will add support for providing an URI via the
'extensions.refStorage' config. The URI will contain the reference
backend and a corresponding payload. The payload can be then used for
providing an alternate locations for the reference backend.

To prepare for this, modify the existing backends to accept such an
argument when initializing via the 'init()' function. Both the files
and reftable backends will parse the information to be filesystem paths
to store references. Given that no callers pass any payload yet this is
essentially a no-op change for now.

To enable this, provide a 'refs_compute_filesystem_location()' function
which will parse the current 'gitdir' and the 'payload' to provide the
final reference directory and common reference directory (if working in
a linked worktree).

The documentation and tests will be added alongside the extension of the
config variable.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-25 09:27:12 -08:00
Karthik Nayak
2a32ac429e refs: move out stub modification to generic layer
When creating the reftable reference backend on disk, we create stubs to
ensure that the directory can be recognized as a Git repository. This is
done by calling `refs_create_refdir_stubs()`. Move this to the generic
layer as this is needed for all backends excluding from the files
backends. In an upcoming commit where we introduce alternate reference
backend locations, we'll have to also create stubs in the $GIT_DIR
irrespective of the backend being used. This commit builds the base to
add that logic.

Similarly, move the logic for deletion of stubs to the generic layer.
The files backend recursively calls the remove function of the
'packed-backend', here skip calling the generic function since that
would try to delete stubs.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-25 09:27:12 -08:00
Karthik Nayak
4ffbb02ee4 refs: extract out refs_create_refdir_stubs()
For Git to recognize a directory as a Git directory, it requires the
directory to contain:

  1. 'HEAD' file
  2. 'objects/' directory
  3. 'refs/' directory

Here, #1 and #3 are part of the reference storage mechanism,
specifically the files backend. Since then, newer backends such as the
reftable backend have moved to using their own path ('reftable/') for
storing references. But to ensure Git still recognizes the directory as
a Git directory, we create stubs.

There are two locations where we create stubs:

- In 'refs/reftable-backend.c' when creating the reftable backend.
- In 'clone.c' before spawning transport helpers.

In a following commit, we'll add another instance. So instead of
repeating the code, let's extract out this code to
`refs_create_refdir_stubs()` and use it.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-25 09:27:12 -08:00
Patrick Steinhardt
67034081ad refs: replace refs_for_each_rawref()
Replace calls to `refs_for_each_rawref()` with the newly introduced
`refs_for_each_ref_ext()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:21:18 -08:00
Patrick Steinhardt
635f08b739 refs: rename each_ref_fn
Similar to the preceding commit, rename `each_ref_fn` to better match
our current best practices around how we name things.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:21:18 -08:00
Patrick Steinhardt
8f0720a5a7 refs: rename do_for_each_ref_flags
The enum `do_for_each_ref_flags` and its individual values don't match
to our current best practices when it comes to naming things. Rename it
to `refs_for_each_flag`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23 13:21:18 -08:00
Junio C Hamano
6176ee2349 Merge branch 'kn/ref-batch-output-error-reporting-fix'
A handful of code paths that started using batched ref update API
(after Git 2.51 or so) lost detailed error output, which have been
corrected.

* kn/ref-batch-output-error-reporting-fix:
  fetch: delay user information post committing of transaction
  receive-pack: utilize rejected ref error details
  fetch: utilize rejected ref error details
  update-ref: utilize rejected error details if available
  refs: add rejection detail to the callback function
  refs: skip to next ref when current ref is rejected
2026-02-09 12:09:10 -08:00
Junio C Hamano
fe8044c396 Merge branch 'kn/ref-batch-output-error-reporting-fix' into next
A handful of code paths that started using batched ref update API
(after Git 2.51 or so) lost detailed error output, which have been
corrected.

* kn/ref-batch-output-error-reporting-fix:
  fetch: delay user information post committing of transaction
  receive-pack: utilize rejected ref error details
  fetch: utilize rejected ref error details
  update-ref: utilize rejected error details if available
  refs: add rejection detail to the callback function
  refs: skip to next ref when current ref is rejected
2026-01-30 09:59:25 -08:00
Karthik Nayak
b52a28b03e refs: skip to next ref when current ref is rejected
In `refs_verify_refnames_available()` we have two nested loops: the
outer loop iterates over all references to check, while the inner loop
checks for filesystem conflicts for a given ref by breaking down its
path.

With batched updates, when we detect a filesystem conflict, we mark the
update as rejected and execute 'continue'. However, this only skips to
the next iteration of the inner loop, not the outer loop as intended.
This causes the same reference to be repeatedly rejected. Fix this by
using a goto statement to skip to the next reference in the outer loop.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-25 22:27:33 -08:00
Patrick Steinhardt
06d6ead762 refs/reftable: introduce generic checks for refs
In a preceding commit we have extracted generic checks for both direct
and symbolic refs that apply for all backends. Wire up those checks for
the "reftable" backend.

Note that this is done by iterating through all refs manually with the
low-level reftable ref iterator. We explicitly don't want to use the
higher-level iterator that is exposed to users of the reftable backend
as that iterator may swallow for example broken refs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:41 -08:00
Patrick Steinhardt
9341740bea refs/reftable: fix consistency checks with worktrees
The ref consistency checks are driven via `cmd_refs_verify()`. That
function loops through all worktrees (including the main worktree) and
then checks the ref store for each of them individually. It follows that
the backend is expected to only verify refs that belong to the specified
worktree.

While the "files" backend handles this correctly, the "reftable" backend
doesn't. In fact, it completely ignores the passed worktree and instead
verifies refs of _all_ worktrees. The consequence is that we'll end up
every ref store N times, where N is the number of worktrees.

Or rather, that would be the case if we actually iterated through the
worktree reftable stacks correctly. But we use `strmap_for_each_entry()`
to iterate through the stacks, but the map is in fact not even properly
populated. So instead of checking stacks N^2 times, we actually only end
up checking the reftable stack of the main worktree.

Fix this bug by only verifying the stack of the passed-in worktree and
constructing the backends via `backend_for_worktree()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:41 -08:00
Patrick Steinhardt
78384e2467 refs/reftable: extract function to retrieve backend for worktree
Pull out the logic to retrieve a backend for a given worktree. This
function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:41 -08:00
Patrick Steinhardt
ab67f0a436 refs/reftable: adapt includes to become consistent
Adapt the includes to be sorted and to use include paths that are
relative to the "refs/" directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:41 -08:00
Patrick Steinhardt
ae38c3a359 refs/files: introduce function to perform normal ref checks
In a subsequent commit we'll introduce new generic checks for direct
refs. These checks will be independent of the actual backend.

Introduce a new function `refs_fsck_ref()` that will be used for this
purpose. At the current point in time it's still empty, but it will get
populated in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:41 -08:00
Patrick Steinhardt
dcecffb616 refs/files: extract generic symref target checks
The consistency checks for the "files" backend contain a couple of
verifications for symrefs that verify generic properties of the target
reference. These properties need to hold for every backend, no matter
whether it's using the "files" or "reftable" backend.

Reimplementing these checks for every single backend doesn't really make
sense. Extract it into a generic `refs_fsck_symref()` function that can
be used by other backends, as well. The "reftable" backend will be wired
up in a subsequent commit.

While at it, improve the consistency checks so that we don't complain
about refs pointing to a non-ref target in case the target refname
format does not verify. Otherwise it's very likely that we'll generate
both error messages, which feels somewhat redundant in this case.

Note that the function has a couple of `UNUSED` parameters. These will
become referenced in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Patrick Steinhardt
7b8c36a2a7 refs/files: perform consistency checks for root refs
While the "files" backend already knows to perform consistency checks
for the "refs/" hierarchy, it doesn't verify any of its root refs. Plug
this omission.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Patrick Steinhardt
9ebccf744a refs/files: improve error handling when verifying symrefs
The error handling when verifying symbolic refs is a bit on the wild
side:

  - `fsck_report_ref()` can be told to ignore specific errors. If an
    error has been ignored and a previous check raised an unignored
    error, then assigning `ret = fsck_report_ref()` will cause us to
    swallow the previous error.

  - When the target reference is not valid we bail out early without
    checking for other errors.

Fix both of these issues by consistently or'ing the return value and not
bailing out early.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Patrick Steinhardt
0ff9cf40b2 refs/files: extract function to check single ref
When checking the consistency of references we create a directory
iterator and then verify each single reference in a loop. The logic to
perform the actual checks is embedded into that loop, which makes it
hard to reuse. But In a subsequent commit we're about to introduce a
second path that wants to verify references.

Prepare for this by extracting the logic to check a single reference
into a standalone function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Patrick Steinhardt
2fe33ae20f refs/files: remove useless indirection
The function `files_fsck_refs()` only has a single callsite and forwards
all of its arguments as-is, so it's basically a useless indirection.
Inline the function call.

While at it, also remove the bitwise or that we have for return values.
We don't really want to or them at all, but rather just want to return
an error in case either of the functions has failed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Patrick Steinhardt
5a74903e62 refs/files: remove refs_check_dir parameter
The parameter `refs_check_dir` determines which directory we want to
check references for. But as we always want to check the complete
refs hierarchy, this parameter is always set to "refs".

Drop the parameter and hardcode it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Patrick Steinhardt
e615643d2e refs/files: move fsck functions into global scope
When performing consistency checks we pass the functions that perform
the verification down the calling stack. This is somewhat unnecessary
though, as the set of functions doesn't ever change.

Simplify the code by moving the array into global scope and remove the
parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Patrick Steinhardt
df971a7c42 refs/files: simplify iterating through root refs
When iterating through root refs we first need to determine the
directory in which the refs live. This is done by retrieving the root of
the loose refs via `refs->loose->root->name`, and putting it through
`files_ref_path()` to derive the final path.

This is somewhat redundant though: the root name of the loose files
cache is always going to be the empty string. As such, we always end up
passing that empty string to `files_ref_path()` as the ref hierarchy we
want to start. And this actually makes sense: `files_ref_path()` already
computes the location of the root directory, so of course we need to
pass the empty string for the ref hierarchy itself. So going via the
loose ref cache to figure out that the root of a ref hierarchy is empty
is only causing confusion.

But next to the added confusion, it can also lead to a segfault. The
loose ref cache is populated lazily, so it may not always be set. It
seems to be sheer luck that this is a condition we do not currently hit.
The right thing to do would be to call `get_loose_ref_cache()`, which
knows to populate the cache if required.

Simplify the code and fix the potential segfault by simply removing the
indirection via the loose ref cache completely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12 06:55:40 -08:00
Junio C Hamano
2365d4f612 Merge branch 'gf/maintenance-is-needed-fix'
Brown-paper-bag fix to a recently graduated
'kn/maintenance-is-needed' topic.

* gf/maintenance-is-needed-fix:
  refs: dereference the value of the required pointer
2025-12-30 12:58:20 +09:00
Greg Funni
46d0ee2d69 refs: dereference the value of the required pointer
Currently, this always prints yes because required is non-null.

This is the wrong behavior. The boolean must be
dereferenced.

Signed-off-by: Greg Funni <gfunni234@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-19 12:55:38 +09:00
Junio C Hamano
c62d2d3810 Merge branch 'kn/maintenance-is-needed'
"git maintenance" command learned "is-needed" subcommand to tell if
it is necessary to perform various maintenance tasks.

* kn/maintenance-is-needed:
  maintenance: add 'is-needed' subcommand
  maintenance: add checking logic in `pack_refs_condition()`
  refs: add a `optimize_required` field to `struct ref_storage_be`
  reftable/stack: add function to check if optimization is required
  reftable/stack: return stack segments directly
2025-11-21 09:14:17 -08:00
Junio C Hamano
ee27005905 Merge branch 'ps/ref-peeled-tags-fixes'
Another fix-up to "peeled-tags" topic.

* ps/ref-peeled-tags-fixes:
  object: fix performance regression when peeling tags
2025-11-19 10:55:42 -08:00
Junio C Hamano
7ccfc262d7 Merge branch 'kn/refs-optim-cleanup'
Code clean-up.

* kn/refs-optim-cleanup:
  t/pack-refs-tests: move the 'test_done' to callees
  refs: rename 'pack_refs_opts' to 'refs_optimize_opts'
  refs: move to using the '.optimize' functions
2025-11-19 10:55:40 -08:00
Junio C Hamano
13134cecb0 Merge branch 'ps/ref-peeled-tags'
Some ref backend storage can hold not just the object name of an
annotated tag, but the object name of the object the tag points at.
The code to handle this information has been streamlined.

* ps/ref-peeled-tags:
  t7004: do not chdir around in the main process
  ref-filter: fix stale parsed objects
  ref-filter: parse objects on demand
  ref-filter: detect broken tags when dereferencing them
  refs: don't store peeled object IDs for invalid tags
  object: add flag to `peel_object()` to verify object type
  refs: drop infrastructure to peel via iterators
  refs: drop `current_ref_iter` hack
  builtin/show-ref: convert to use `reference_get_peeled_oid()`
  ref-filter: propagate peeled object ID
  upload-pack: convert to use `reference_get_peeled_oid()`
  refs: expose peeled object ID via the iterator
  refs: refactor reference status flags
  refs: fully reset `struct ref_iterator::ref` on iteration
  refs: introduce `.ref` field for the base iterator
  refs: introduce wrapper struct for `each_ref_fn`
2025-11-19 10:55:39 -08:00
Karthik Nayak
f6c5ca387a refs: add a optimize_required field to struct ref_storage_be
To allow users of the refs namespace to check if the reference backend
requires optimization, add a new field `optimize_required` field to
`struct ref_storage_be`. This field is of type `optimize_required_fn`
which is also introduced in this commit.

Modify the debug, files, packed and reftable backend to implement this
field. A following commit will expose this via 'git pack-refs' and 'git
refs optimize'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-10 09:28:48 -08:00
Junio C Hamano
f58ea683b5 Merge branch 'pk/reflog-migrate-message-fix'
Message fix.

* pk/reflog-migrate-message-fix:
  refs: add missing space in messages
2025-11-06 14:52:57 -08:00
Patrick Steinhardt
7048e74609 object: fix performance regression when peeling tags
Our Bencher dashboards [1] have recently alerted us about a bunch of
performance regressions when writing references, specifically with the
reftable backend. There is a 3x regression when writing many refs with
preexisting refs in the reftable format, and a 10x regression when
migrating refs between backends in either of the formats.

Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled
object IDs for invalid tags, 2025-10-23). The gist of the commit is that
we may end up storing peeled objects in both reftables and packed-refs
for corrupted tags, where the claimed tagged object type is different
than the actual tagged object type. This will then cause us to create
the `struct object *` with a wrong type, as well, and obviously nothing
good comes out of that.

The fix for this issue was to introduce a new flag to `peel_object()`
that causes us to verify the tagged object's type before writing it into
the refdb -- if the tag is corrupt, we skip writing the peeled value.
To verify whether the peeled value is correct we have to look up the
object type via the ODB and compare the actual type with the claimed
type, and that additional object lookup is costly.

This also explains why we see the regression only when writing refs with
the reftable backend, but we see the regression with both backends when
migrating refs:

  - The reftable backend knows to store peeled values in the new table
    immediately, so it has to try and peel each ref it's about to write
    to the transaction. So the performance regression is visible for all
    writes.

  - The files backend only stores peeled values when writing the
    packed-refs file, so it wouldn't hit the performance regression for
    normal writes. But on ref migrations we know to write all new values
    into the packed-refs file immediately, and that's why we see the
    regression for both backends there.

Taking a step back though reveals an oddity in the new verification
logic: we not only verify the _tagged_ object's type, but we also verify
the type of the tag itself. But this isn't really needed, as we wouldn't
hit the bug in such a case anyway, as we only hit the issue with corrupt
tags claiming an invalid type for the tagged object.

The consequence of this is that we now started to look up the target
object of every single reference we're about to write, regardless of
whether it even is a tag or not. And that is of course quite costly.

Fix the issue by only verifying the type of the tagged objects. This
means that we of course still have a performance hit for actual tags.
But this only happens for writes anyway, and I'd claim it's preferable
to not store corrupted data in the refdb than to be fast here. Rename
the flag accordingly to clarify that we only verify the tagged object's
type.

This fix brings performance back to previous levels:

    Benchmark 1: baseline
      Time (mean ± σ):      46.0 ms ±   0.4 ms    [User: 40.0 ms, System: 5.7 ms]
      Range (min … max):    45.0 ms …  47.1 ms    54 runs

    Benchmark 2: regression
      Time (mean ± σ):     140.2 ms ±   1.3 ms    [User: 77.5 ms, System: 60.5 ms]
      Range (min … max):   138.0 ms … 142.7 ms    20 runs

    Benchmark 3: fix
      Time (mean ± σ):      46.2 ms ±   0.4 ms    [User: 40.2 ms, System: 5.7 ms]
      Range (min … max):    45.0 ms …  47.3 ms    55 runs

    Summary
      update-ref: baseline
        1.00 ± 0.01 times faster than fix
        3.05 ± 0.04 times faster than regression

[1]: https://bencher.dev/perf/git/plots

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06 10:54:34 -08:00
Junio C Hamano
994869e2b5 Merge branch 'ps/ref-peeled-tags' into ps/ref-peeled-tags-fixes
* ps/ref-peeled-tags:
  t7004: do not chdir around in the main process
  ref-filter: fix stale parsed objects
  ref-filter: parse objects on demand
  ref-filter: detect broken tags when dereferencing them
  refs: don't store peeled object IDs for invalid tags
  object: add flag to `peel_object()` to verify object type
  refs: drop infrastructure to peel via iterators
  refs: drop `current_ref_iter` hack
  builtin/show-ref: convert to use `reference_get_peeled_oid()`
  ref-filter: propagate peeled object ID
  upload-pack: convert to use `reference_get_peeled_oid()`
  refs: expose peeled object ID via the iterator
  refs: refactor reference status flags
  refs: fully reset `struct ref_iterator::ref` on iteration
  refs: introduce `.ref` field for the base iterator
  refs: introduce wrapper struct for `each_ref_fn`
2025-11-06 10:54:28 -08:00
Peter Krefting
d9988b063f refs: add missing space in messages
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-05 15:04:26 -08:00
Junio C Hamano
517964205c Merge branch 'xr/ref-debug-remove-on-disk'
The "debug" ref-backend was missing a method implementation, which
has been corrected.

* xr/ref-debug-remove-on-disk:
  refs: add missing remove_on_disk implementation for debug backend
2025-11-04 07:48:08 -08:00