When we have an "onbranch" condition we need to ask the reference
database whether HEAD currently points at the configured branch. This
unfortunately creates a chicken-and-egg problem:
- The reference database needs to read the configuration so that it
can configure itself.
- The configuration needs to construct a reference database to fully
parse all of its conditionals.
The way we handle this is by simply excluding "onbranch" conditionals
when we haven't yet configured the reference database.
The mechanism for this is broken though: to verify whether or not we
have configured the reference database we check whether its format is
set to `REF_STORAGE_UNKNOWN` in `include_by_branch()`. But typically,
the format _is_ already known at that time because we set it up during
repository discovery in "setup.c".
The consequence is that we have recursion:
1. We call `get_main_ref_store()`.
2. We don't yet have a reference store, so we call `ref_store_init()`.
3. We parse the configuration required for the reference store.
4. We eventually end up in `include_by_branch()`.
5. We have already configured the reference storage format, so we end
up calling `get_main_ref_store()` again.
We still haven't finished (1) though, so `get_main_ref_store()` will now
call `ref_store_init()` a second time. The end result is that we have
constructed the same reference store twice.
Of course, as both reference stores would be assigned to `refs_private`,
we leak one of those two instances. This never surfaced as an actual
leak though because the pointer is kept alive by the "chdir_notify"
subsystem.
For now, we can fix the issue by explicitly unsetting the reference
storage format before constructing it. This makes the mentioned check
trigger as expected, and consequently we won't end up constructing a
second reference database at all. Ultimately, this means that we
consistently stop evaluating "onbranch" conditions when constructing the
main reference database.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we release worktree and submodule reference databases when
clearing a repository, we don't ever release the main reference
database. This memory leak went unnoticed because its pointer is
kept alive by the "chdir_notify" subsystem.
Fix the memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commit we've removed all callers of
`chdir_notify_reparent()`, so the function is unused now. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating reference stores we register them with the "chdir_notify"
subsystem. This is required because some of the paths we track may be
relative paths, so we have to reparent them in case the current working
directory changes.
But while we register the reference stores, we never unregister them.
This can have multiple outcomes:
- For a repository's main reference database we essentially keep the
pointer alive. We never free that database, either, and our leak
checker doesn't notice because it's still registered.
- For submodule and worktree reference databases we do eventually free
them in `repo_clear()`, so we may keep pointers to free'd memory
registered. We never notice though as we don't tend to chdir around
in the middle of the process.
We never noticed either of these symptoms, but they are obviously bad.
Partially fix those issues by unregistering the reference stores when
releasing them. The leak of the main reference database will be fixed in
a subsequent commit.
Note that this requires us to use `chdir_notify_register()` instead of
`chdir_notify_parent()`, as there is no infrastructure to unregister the
latter. It ultimately doesn't matter much though: in a subsequent commit
we'll drop this infrastructure completely. We merely require this step
here so that we can fix the memory leaks ahead of time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When discovering a repository we eventually also apply the
"GIT_REFERENCE_BACKEND" environment variable to the repository. There's
two problems with that:
- We do this unconditionally, which is rather pointless: we really
only have to configure the repository when we have found one.
- We have already applied the repository format at that point in time,
so we need to manually reapply it.
Move the logic around so that we only apply the environment variable
when a repository was discovered. This also allows us to drop the
explcit call to `repo_set_ref_storage_format()` because we now adjust
the format before we apply it via `apply_repository_format()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When discovering the repository in "setup.c" we apply the final
repository format multiple times:
- Once via `repository_format_configure()`, where we configure the
repository format for both `struct repository_format` and `struct
repository`.
- And once via `apply_repository_format()`, where we then apply the
`struct repository_format` to the `struct repository` again.
As the format will be applied to the repository when applying the format
it's thus somewhat unnecessary to also apply it to the repository when
adapting the discovered format. The only reason we have to do this is
because we call `repository_format_configure()` after we have already
applied it.
Refactor the code so that we first configure the repository format
before applying it to the repository so that we can stop setting the
hash and reference storage format multiple times.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two callsites of `check_and_apply_repository_format()`. In a
subsequent commit we'll want to adapt one of those callsites to change
the order in which we read and apply the repository format, at which
point the helper function will not really be a good fit for us anymore.
Inline the function to both of the callsites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/setup-centralize-odb-creation:
setup: construct object database in `apply_repository_format()`
repository: stop reading loose object map twice on repo init
setup: stop initializing object database without repository
setup: stop creating the object database in `setup_git_env()`
repository: stop initializing the object database in `repo_set_gitdir()`
setup: deduplicate logic to apply repository format
setup: drop `setup_git_env()`
t0001: plug test gaps for git-init(1) with GIT_OBJECT_DIRECTORY
The documentation for `push.default = simple` has been clarified to
better explain its behavior, making it clear that it pushes the
current branch to a same-named branch on the remote, and detailing
the upstream requirements for centralized workflows.
* ib/doc-push-default-simple:
doc: clarify push.default=simple behavior
The 'git-jump' command (in contrib/) has been taught to automatically
pick a mode (merge, diff, or ws) when invoked without arguments.
* gh/jump-auto-mode:
git-jump: pick a mode automatically when invoked without arguments
Formatting object name in full hexadecimal form has been optimized
by using a new strbuf_add_oid_hex() helper function.
* rs/strbuf-add-oid-hex:
hex: add and use strbuf_add_oid_hex()
Adding a decimal integer with strbuf_addf("%u") appears commonly;
they have been optimized by using a custom formatter.
* rs/strbuf-add-uint:
ls-tree: use strbuf_add_uint()
ls-files: use strbuf_add_uint()
cat-file: use strbuf_add_uint()
strbuf: add strbuf_add_uint()
"git push" learned to take a "remote group" name to push to, which
causes pushes to multiple places, just like "git fetch" would do.
* ua/push-remote-group:
push: support pushing to a remote group
remote: move remote group resolution to remote.c
remote: fix sign-compare warnings in push_cas_option
The "promisor.quiet" configuration variable was not used from
relevant submodules when commands like "grep --recurse-submodules"
triggered a lazy fetch, which has been corrected.
* th/promisor-quiet-per-repo:
promisor-remote: fix promisor.quiet to use the correct repository
Reachability bitmap generation has been significantly optimized. By
reordering tree traversal, caching object positions, and refining how
pseudo-merge bitmaps are constructed, the performance of "git repack
--write-midx-bitmaps" is improved, especially for large repositories
and when using pseudo-merges.
* tb/bitmap-build-performance:
pack-bitmap: build pseudo-merge bitmaps after regular bitmaps
pack-bitmap: remember pseudo-merge parents
pack-bitmap: sort bitmaps before XORing
pack-bitmap: cache object positions during fill
pack-bitmap: consolidate `find_object_pos()` success path
pack-bitmap: reuse stored selected bitmaps
pack-bitmap: check subtree bits before recursing
pack-bitmap: pass object position to `fill_bitmap_tree()`
A batch of documentation pages has been updated to use the modern
synopsis style.
* ja/doc-synopsis-style-again:
doc: convert git-imap-send synopsis and options to new style
doc: convert git-apply synopsis and options to new style
doc: convert git-am synopsis and options to new style
doc: convert git-grep synopsis and options to new style
doc: git bisect: clarify the usage of the synopsis vs actual command
doc: convert git-bisect to synopsis style
The check for non-stale commits in the priority queue used by
`paint_down_to_common` and `ahead_behind` has been optimized by
replacing an O(N) scan with an O(1) counter, yielding performance
improvements in repositories with wide histories.
* kk/commit-reach-optim:
commit-reach: replace queue_has_nonstale() scan with O(1) tracking
commit-reach: deduplicate queue entries in paint_down_to_common
object.h: fix stale entries in object flag allocation table
"git stash -p" has been optimized by reusing cached index
entries in its temporary index, avoiding unnecessary lstat()
calls on unchanged files.
* aj/stash-patch-optimize-temporary-index:
stash: reuse cached index entries in --patch temporary index
'git restore --staged' has been optimized to avoid unnecessarily expanding
the sparse index when operating on paths within the sparse checkout
definition, by handling sparse directory entries at the tree level.
* ds/restore-sparse-index:
restore: avoid sparse index expansion
t1092: test 'git restore' with sparse index
The GIT_WORK_TREE variable prepared to invoke the push-to-checkout
hook was leaking into the environment even when there was no hook
used and broke the default push-to-deploy (i.e., let "git checkout"
update the working tree only when the working tree is clean).
* ar/receive-pack-worktree-env:
receive-pack: fix updateInstead with core.worktree
With the preceding changes we now always construct the repository's
object database before applying the repository format. Remove this
duplication by constructing it in `apply_repository_format()` instead.
Note that we create the object database _after_ having set up the
repository's hash algorithm, but _before_ setting the compat hash
algorithm. This is intentional:
- Constructing the object database may require knowledge of its
intended object format.
- Setting up the compatibility hash requires the object database to be
initialized already, because we immediately read the loose object
map.
The first point is sensible, the second maybe a little less so. Ideally,
it should be the responsibility of the object database itself to
initialize any data structures required for the compatibility hash. But
this would require further changes, so this is kept as-is for now.
Further note that this requires us to move handling of the environment
variables GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES into
the repository format, as well. This allows the caller more flexibility
around whether or not those environment variables are being honored, as
we want to respect them in "setup.c", but not in "repository.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When initializing a repository via `repo_init()` we end up reading the
loose object map twice:
- `apply_repository_format()` calls `repo_set_compat_hash_algo()`,
which in turn calls `repo_read_loose_object_map()` if we have a
compatibility hash configured.
- `repo_init()` calls `repo_read_loose_object_map()` directly a second
time.
Drop the second read of the loose object map in `repo_init()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `setup_git_directory_gently()` is responsible for
discovering and setting up a Git repository based on various environment
variables and the current working directory. The result is thus a fully
usable Git repository.
One oddity of this function is that we may set up the object database
even in the case where we don't have a repository, namely in the case
where the `GIT_DIR_EXPLICIT` environment variable is set but points to a
non-existent repository. If so, we call `setup_git_env_internal()` with
the value of the environment variable so that the repository's Git
directory is configured, even if it points to a non-existent directory.
Historically though, this function didn't only configure the repository,
but also initialized the object database. We retained this behaviour
from a preceding commit, even though it really doesn't make much sense
in the first place -- there is no repository, so we don't have an object
database either. There seemingly isn't much of a reason to construct the
object database, as we typically won't try to read objects when we don't
have an object database.
There's one exception though: git-index-pack(1) may run outside of a
repository, which can be used to perform consistency checks for a
packfile. The code path is _almost_ working: we already know to call
`parse_object_buffer()`, which can read objects without an object
database being available. And that works for all object types except for
commits, because `parse_commit_buffer()` calls `parse_commit_graph()`,
and that function doesn't handle the case where we don't have an object
database.
Fix this instance to check for the object database instead of checking
for the Git directory having been initialized. With this fixed, we can
now stop constructing an object database completely.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we have stopped creating the object database in
`repo_set_gitdir()`. But the logic is still somewhat confusing as we
still end up creating it conditionally in `setup_git_dir()`, which is
called multiple times.
Drop the conditional logic and instead create the object database in all
places where we have discovered and configured a repository.
This leads to even more duplication than we already had in the preceding
commit, but an alert reader may notice that we now (almost) always call
`odb_new()` directly before having called `apply_repository_format()`.
The only exception to this is `setup_git_directory_gently()`, where we
also call the function when _not_ applying the repository format. This
will be fixed in the next commit, and once that's done we can then unify
creation of the object database into `apply_repository_format()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `repo_set_gitdir()` obviously sets the Git directory for a
given repository. Less obviously though, the function also configures a
couple of auxiliary settings.
One such thing is that we create the object database in this function.
This logic only happens conditionally though, as `set_git_dir()` may be
called multiple times during repository setup, and we don't want to
create the object database multiple times. This is somewhat tangled and
hard to follow.
Remove the logic from `repo_set_gitdir()` and instead initialize the
object database outside of it. This leads to some duplication right now,
but that duplication will be removed in a subsequent step where we will
start initializing the object database as part of applying the repo's
format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After having discovered the repository format we then apply it to the
repository so that it knows to use the proper repository extensions. The
logic to apply the format is duplicated across three callsites, which
makes it rather painfull to add new extensions.
Introduce a new function `apply_repository_format()` that takes a repo
and applies a given format to it and adapt all callsites to use it.
This function is also the new caller of `verify_repository_format()` so
that we can ensure that we never apply an invalid repository format.
The verification we have in `read_and_verify_repository_format()` is
thus redundant now and dropped.
Rename `read_and_verify_repository_format()` accordingly. While at it,
also rename `check_repository_format()` to clarify that it doesn't only
_check_ the format, but that it also applies it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `setup_git_env()` function is a trivial wrapper around
`setup_git_env_internal()` and has a single call site only. Drop the
function.
While at it, drop stale documentation in "environment.h" that points to
this function, even though it hasn't been exposed to callers outside of
"setup.c" since 43ad1047a9 (setup: stop using `the_repository` in
`setup_git_env()`, 2026-03-27) anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent commits we'll rework how we set up the repository. This is
a somewhat intricate and thus fragile sequence; there's many things that
can go subtly wrong, and there are lots of interesting interactions that
one can discover.
One such discovered edge case was the interaction between git-init(1)
and the "GIT_OBJECT_DIRECTORY" environment variable. When set, the
behaviour is that the object directory should be created at the path
that the variable points to. This behaviour is documented as such in
its man page:
If the object storage directory is specified via the
GIT_OBJECT_DIRECTORY environment variable then the sha1 directories
are created underneath; otherwise, the default $GIT_DIR/objects
directory is used.
Curiously enough though we don't seem to have any tests that exercise
this directly, and thus a subsequent commit inadvertently would have
broken this expectation.
Plug this test gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git pack-objects --path-walk" traversal has been integrated
with several object filters, including blobless and sparse filters.
* ds/path-walk-filters:
path-walk: support `combine` filter
path-walk: support `object:type` filter
path-walk: support `tree:0` filter
t6601: tag otherwise-unreachable trees
pack-objects: support sparse:oid filter with path-walk
path-walk: add pl_sparse_trees to control tree pruning
path-walk: support blob size limit filter
backfill: die on incompatible filter options
path-walk: support blobless filter
path-walk: always emit directly-requested objects
t/perf: add pack-objects filter and path-walk benchmark
pack-objects: pass --objects with --path-walk
t5620: make test work with path-walk var
"Friday noon" asked in the morning on Sunday was parsed to be one
day before the specified time, which has been corrected.
* ta/approxidate-noon-fix:
approxidate: use deferred mday adjustments for "specials"
approxidate: make "specials" respect fixed day-of-month
t0006: add support for approxidate test date adjustment
approxidate: make "today" wrap to midnight
The "name" argument in git_connect() and related functions has been
converted to a "service" enum to improve type safety and clarify its
purpose.
* jk/connect-service-enum:
transport-helper: fix typo in BUG() message
connect: use "service" enum for "name" argument
"git cat-file --batch" learns an in-line command "mailmap"
that lets the user toggle use of mailmap.
* sa/cat-file-batch-mailmap-switch:
cat-file: add mailmap subcommand to --batch-command
The logic to lazy-load trees from the commit-graph has been made
more robust by falling back to reading the commit object when
the commit-graph is no longer available.
* jk/commit-graph-lazy-load-fallback:
commit: fall back to full read when maybe_tree is NULL
The fsmonitor daemon has been implemented for Linux.
* pt/fsmonitor-linux:
fsmonitor: convert shown khash to strset in do_handle_client
fsmonitor: add tests for Linux
fsmonitor: add timeout to daemon stop command
fsmonitor: close inherited file descriptors and detach in daemon
run-command: add close_fd_above_stderr option
fsmonitor: implement filesystem change listener for Linux
fsmonitor: rename fsm-settings-darwin.c to fsm-settings-unix.c
fsmonitor: rename fsm-ipc-darwin.c to fsm-ipc-unix.c
fsmonitor: use pthread_cond_timedwait for cookie wait
compat/win32: add pthread_cond_timedwait
fsmonitor: fix hashmap memory leak in fsmonitor_run_daemon
fsmonitor: fix khash memory leak in do_handle_client
t9210, t9211: disable GIT_TEST_SPLIT_INDEX for scalar clone tests
The graph output from commands like "git log --graph" can now be
limited to a specified number of lanes, preventing overly wide output
in repositories with many branches.
* ps/graph-lane-limit:
graph: add truncation mark to capped lanes
graph: add --graph-lane-limit option
graph: limit the graph width to a hard-coded max
"git bisect" now uses the selected terms (e.g., old/new) more
consistently in its output.
* jr/bisect-custom-terms-in-output:
rev-parse: use selected alternate terms to look up refs
bisect: print bisect terms in single quotes
bisect: use selected alternate terms in status output
Revision traversal optimization.
* kk/tips-reachable-from-bases-optim:
t6600: add tests for duplicate tips in tips_reachable_from_bases()
commit-reach: use object flags for tips_reachable_from_bases()
These functions were deprecated in a series of commits merged in
52882024 (Merge branch 'ps/commit-list-functions-renamed', 2026-02-13).
The compatibility was for in-flight topics at the time.
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>