Many uses of the_repository has been updated to use a more
appropriate struct repository instance in setup.c codepath.
* ps/setup-wo-the-repository:
setup: stop using `the_repository` in `init_db()`
setup: stop using `the_repository` in `create_reference_database()`
setup: stop using `the_repository` in `initialize_repository_version()`
setup: stop using `the_repository` in `check_repository_format()`
setup: stop using `the_repository` in `upgrade_repository_format()`
setup: stop using `the_repository` in `setup_git_directory()`
setup: stop using `the_repository` in `setup_git_directory_gently()`
setup: stop using `the_repository` in `setup_git_env()`
setup: stop using `the_repository` in `set_git_work_tree()`
setup: stop using `the_repository` in `setup_work_tree()`
setup: stop using `the_repository` in `enter_repo()`
setup: stop using `the_repository` in `verify_non_filename()`
setup: stop using `the_repository` in `verify_filename()`
setup: stop using `the_repository` in `path_inside_repo()`
setup: stop using `the_repository` in `prefix_path()`
setup: stop using `the_repository` in `is_inside_work_tree()`
setup: stop using `the_repository` in `is_inside_git_dir()`
setup: replace use of `the_repository` in static functions
The repacking code has been refactored and compaction of MIDX layers
have been implemented, and incremental strategy that does not require
all-into-one repacking has been introduced.
* tb/incremental-midx-part-3.3:
repack: allow `--write-midx=incremental` without `--geometric`
repack: introduce `--write-midx=incremental`
repack: implement incremental MIDX repacking
packfile: ensure `close_pack_revindex()` frees in-memory revindex
builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
repack-geometry: prepare for incremental MIDX repacking
repack-midx: extract `repack_fill_midx_stdin_packs()`
repack-midx: factor out `repack_prepare_midx_command()`
midx: expose `midx_layer_contains_pack()`
repack: track the ODB source via existing_packs
midx: support custom `--base` for incremental MIDX writes
midx: introduce `--no-write-chain-file` for incremental MIDX writes
midx: use `strvec` for `keep_hashes`
midx: build `keep_hashes` array in order
midx: use `strset` for retained MIDX files
midx-write: handle noop writes when converting incremental chains
The "name" argument in git_connect() and related functions has been
converted to a "service" enum to improve type safety and clarify its
purpose.
* jk/connect-service-enum:
connect: use "service" enum for "name" argument
The negotiation tip options in "git fetch" have been reworked to
allow requiring certain refs to be sent as "have" lines, and to
restrict negotiation to a specific set of refs.
* ds/fetch-negotiation-options:
send-pack: pass negotiation config in push
remote: add remote.*.negotiationInclude config
fetch: add --negotiation-include option for negotiation
negotiator: add have_sent() interface
remote: add remote.*.negotiationRestrict config
transport: rename negotiation_tips
fetch: add --negotiation-restrict option
t5516: fix test order flakiness
Add a new 'remote.<name>.negotiationInclude' multi-valued config option that
provides default values for --negotiation-include when no
--negotiation-include arguments are specified over the command line. This
is a mirror of how 'remote.<name>.negotiationRestrict' specifies defaults
for the --negotiation-restrict arguments.
Each value is either an exact ref name or a glob pattern whose tips should
always be sent as 'have' lines during negotiation. The config values are
resolved through the same resolve_negotiation_include() codepath as the CLI
options.
This option is additive with the normal negotiation process: the negotiation
algorithm still runs and advertises its own selected commits, but the refs
matching the config are sent unconditionally on top of those heuristically
selected commits.
Similar to the negotiationRestrict config, an empty value resets the value
list to allow ignoring earlier config values, such as those that might be
set in system or global config.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new --negotiation-include option to 'git fetch', which ensures
that certain ref tips are always sent as 'have' lines during fetch
negotiation, regardless of what the negotiation algorithm selects.
This is useful when the repository has a large number of references, so
the normal negotiation algorithm truncates the list. This is especially
important in repositories with long parallel commit histories. For
example, a repo could have a 'dev' branch for development and a
'release' branch for released versions. If the 'dev' branch isn't
selected for negotiation, then it's not a big deal because there are
many in-progress development branches with a shared history. However, if
'release' is not selected for negotiation, then the server may think
that this is the first time the client has asked for that reference,
causing a full download of its parallel commit history (and any extra
data that may be unique to that branch). This is based on a real example
where certain fetches would grow to 60+ GB when a release branch
updated.
This option is a complement to --negotiation-restrict, which reduces the
negotiation ref set to a specific list. In the earlier example, using
--negotiation-restrict to focus the negotiation to 'dev' and 'release'
would avoid those problematic downloads, but would still not allow
advertising potentially-relevant user branches. In this way, the
'include' version solves the problem I mention while allowing
negotiation to pick other references opportunistically. The two options
can also be combined to allow the best of both worlds.
The argument may be an exact ref name or a glob pattern. Non-existent
refs are silently ignored. This behavior is also updated in the ref matching
logic for the related --negotiation-restrict option to match.
The implementation outputs the requested objects as haves before the
negotiator performs its own algorithm to choose the next haves. Use the new
have_sent() interface to signal these have commits were sent before engaging
with the negotiator's next() iterator.
Also add --negotiation-include to 'git pull' passthrough options.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a previous change, the --negotiation-restrict command-line option of 'git
fetch' was added as a synonym of --negotiation-tip. Both of these options
restrict the set of 'haves' the client can send as part of negotiation.
This was previously not available via a configuration option. Add a new
'remote.<name>.negotiationRestrict' multi-valued config option that updates
'git fetch <name>' to use these restrictions by default.
If the user provides even one --negotiation-restrict argument, then the
config is ignored.
An empty value resets the value list to allow ignoring earlier config
values, such as those that might be set in system or global config.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change added the --negotiation-restrict synonym for the
--negotiation-tip option for 'git fetch'. In anticipation of adding a new
option that behaves similarly but with distinct changes to its behavior,
rename the internal representation of this data from 'negotiation_tips' to
'negotiation_restrict_tips'.
The 'tips' part is kept because this is an oid_array in the transport layer.
This requires the builtin to handle parsing refs into collections of oids so
the transport layer can handle this cleaner form of the data.
Also update the string_list used to store the inputs from command-line
options.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --negotiation-tip option to 'git fetch' and 'git pull' allows users
to specify that they want to focus negotiation on a small set of
references. This is a _restriction_ on the negotiation set, helping to
focus the negotiation when the ref count is high. However, it doesn't
allow for the ability to opportunistically select references beyond that
list.
This subtle detail that this is a 'maximum set' and not a 'minimum set'
is not immediately clear from the option name. This makes it more
complicated to add a new option that provides the complementary behavior
of a minimum set.
For now, create a new synonym option, --negotiation-restrict, that
behaves identically to --negotiation-tip. Update the documentation to
make it clear that this new name is the preferred option, but we keep
the old name for compatibility. Mark --negotiation-tip as an alias of the
new, preferred option.
Update a few warning messages with the new option, but also make them
translatable with the option name inserted by formatting. At least one
of these messages will be reused later for a new option.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, `--write-midx=incremental` required `--geometric` and would
die() without it. Relax this restriction so that incremental MIDX
repacking can be used independently.
Without `--geometric`, the behavior is append-only: a single new MIDX
layer is created containing whatever packs were written by the repack
and appended to the existing chain (or a new chain is started). Existing
layers are preserved as-is with no compaction or merging.
Implement this via a new repack_make_midx_append_plan() that builds a
plan consisting of a WRITE step for the freshly written packs followed
by COPY steps for every existing MIDX layer. The existing compaction
plan (repack_make_midx_compaction_plan) is used only when `--geometric`
is active.
Update the documentation to describe the behavior with and without
`--geometric`, and replace the test that enforced the old restriction
with one exercising append-only incremental MIDX repacking.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose the incremental MIDX repacking mode (implemented in an earlier
commit) via a new --write-midx=incremental option for `git repack`.
Add "incremental" as a recognized argument to the --write-midx
OPT_CALLBACK, mapping it to REPACK_WRITE_MIDX_INCREMENTAL. When this
mode is active and --geometric is in use, set the midx_layer_threshold
on the pack geometry so that only packs in sufficiently large tip layers
are considered for repacking.
Two new configuration options control the compaction behavior:
- repack.midxSplitFactor (default: 2): the factor used in the
geometric merging condition for MIDX layers.
- repack.midxNewLayerThreshold (default: 8): the minimum number of
packs in the tip MIDX layer before its packs are considered as
candidates for geometric repacking.
Add tests exercising the new mode across a variety of scenarios
including basic geometric violations, multi-round chain integrity,
branching and merging histories, cross-layer object uniqueness, and
threshold-based compaction.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `write_midx_incremental()` function, which builds and
maintains an incremental MIDX chain as part of the geometric repacking
process.
Unlike the default mode which writes a single flat MIDX, the incremental
mode constructs a compaction plan that determines which MIDX layers to
write, compact, or copy, and then executes each step using `git
multi-pack-index` subcommands with the --no-write-chain-file flag.
The repacking strategy works as follows:
* Acquire the lock guarding the multi-pack-index-chain.
* A new MIDX layer is always written containing the newly created
pack(s). If the tip MIDX layer was rewritten during geometric
repacking, any surviving packs from that layer are also included.
* Starting from the new layer, adjacent MIDX layers are merged together
as long as the accumulated object count exceeds half the object count
of the next deeper layer (controlled by 'repack.midxSplitFactor').
* Remaining layers in the chain are evaluated pairwise and either
compacted or copied as-is, following the same merging condition.
* Write the contents of the new multi-pack-index chain, atomically move
it into place, and then release the lock.
* Delete any now-unused MIDX layers.
After writing the new layer, the strategy is evaluated among the
existing MIDX layers in order from oldest to newest. Each step that
writes a new MIDX layer uses "--no-write-chain-file" to avoid updating
the multi-pack-index-chain file. After all steps are complete, the new
chain file is written and then atomically moved into place.
At present, this functionality is exposed behind a new enum value,
`REPACK_WRITE_MIDX_INCREMENTAL`, but has no external callers. A
subsequent commit will expose this mode via `git repack
--write-midx=incremental`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the --write-midx (-m) flag from an OPT_BOOL to an OPT_CALLBACK
that accepts an optional mode argument. Introduce an enum with
REPACK_WRITE_MIDX_NONE and REPACK_WRITE_MIDX_DEFAULT to distinguish
between the two states, and update all existing boolean checks
accordingly.
For now, passing no argument (or just `-m`) selects the default mode,
preserving existing behavior. A subsequent commit will add a new mode
for writing incremental MIDXs.
Extract repack_write_midx() as a dispatcher that selects the
appropriate MIDX-writing implementation based on the mode.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Store the ODB source in the `existing_packs` struct and use that in
place of the raw `repo->objects->sources` access within `cmd_repack()`.
The source used is still assigned from the first source in the list, so
there are no functional changes in this commit. The changes instead
serve two purposes (one immediate, one not):
- The incremental MIDX-based repacking machinery will need to know what
source is being used to read the existing MIDX/chain (should one
exist).
- In the future, if "git repack" is taught how to operate on other
object sources, this field will serve as the authoritative value for
that source.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `compact` and `write --incremental` fix the base of the resulting
MIDX layer: `compact` always places the compacted result on top of
"from's" immediate parent in the chain, and `write --incremental` always
appends a new layer to the existing tip. In both cases the base is not
configurable.
Future callers need additional flexibility. For instance, the incremental
MIDX-based repacking code may wish to write a layer based on some
intermediate ancestor rather than the current tip, or produce a root
layer when replacing the bottommost entries in the chain.
Introduce a new `--base` option for both subcommands to specify the
checksum of the MIDX layer to use as the base. The given checksum must
refer to a valid layer in the MIDX chain that is an ancestor of the
topmost layer being written or compacted.
The special value "none" is accepted to produce a root layer with no
parent. This will be needed when the incremental repacking machinery
determines that the bottommost layers of the chain should be replaced.
If no `--base` is given, behavior is unchanged: `compact` uses "from's"
immediate parent in the chain, and `write` appends to the existing tip.
For the `write` subcommand, `--base` requires `--no-write-chain-file`. A plain
`write --incremental` appends a new layer to the live chain tip with no
mechanism to atomically replace it; overriding the base would produce a
layer that does not extend the tip, breaking chain invariants. With
`--no-write-chain-file` the chain is left unmodified and the caller is
responsible for assembling a valid chain.
For `compact`, no such restriction applies. The compaction operation
atomically replaces the compacted range in the chain file, so writing
the result on top of any valid ancestor preserves chain invariants.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing an incremental MIDX layer, the MIDX machinery writes the
new layer into the multi-pack-index.d directory and then updates the
multi-pack-index-chain file to include the freshly written layer.
Future callers however may not wish to immediately update the MIDX chain
itself, preferring instead to write out new layer(s) themselves before
atomically updating the chain. Concretely, the new incremental
MIDX-based repacking strategy will want to do exactly this (that is,
assemble the new MIDX chain itself before writing a new chain file and
atomically linking it into place).
Introduce a `--no-write-chain-file` flag that:
* writes the new MIDX layer into the multi-pack-index.d directory
* prints its checksum
* does not update the multi-pack-index-chain file.
The MIDX chain file (and thus, the lock protecting it) remain untouched,
allowing callers to assemble the chain themselves. This flag requires
`--incremental`, since the notion of a separate layer only makes sense
for incremental MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to determine that branches in an octopus merge are
independent has been optimized.
* kk/merge-octopus-optim:
merge: use repo_in_merge_bases for octopus up-to-date check
In a lazy clone, "git cherry" and "git grep" often fetch necessary
blob objects one by one from promisor remotes. It has been corrected
to collect necessary object names and fetch them in bulk to gain
reasonable performance.
* en/batch-prefetch:
grep: prefetch necessary blobs
builtin/log: prefetch necessary blobs for `git cherry`
patch-ids.h: add missing trailing parenthesis in documentation comment
promisor-remote: document caller filtering contract
Stop using `the_repository` in `init_db()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `create_reference_database()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `initialize_repository_version()` and
instead accept the repository as a parameter. The injection of
`the_repository` is thus bumped one level higher, where callers now pass
it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `setup_git_directory()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `setup_git_directory_gently()` and
instead accept the repository as a parameter. The injection of
`the_repository` is thus bumped one level higher, where callers now pass
it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `set_git_work_tree()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Similar as with the preceding commit, we track whether the worktree has
been initialized already via a global variable so that we can die in
case the repository is re-initialized with a different worktree path.
Store this info in the `struct repository` instead so that we correctly
handle this per repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `setup_work_tree()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Note that the function tracks two bits of information via global
variables. This of course doesn't make much sense anymore now that we
can set up worktrees for arbitrary repositories:
- We track whether the worktree has already been initialized and, if
so, we skip the call to `chdir_notify()` and setenv(3p). It does not
make much sense to store this info in the repository, as we _would_
want to update the environment when switching between worktrees back
and forth.
So instead of storing this info in the repository, we drop this
state entirely and live with the fact that we may execute the logic
twice. It should ultimately be idempotent though and thus not be
much of a problem.
- We track whether the worktree configuration is bogus. If so, and if
later on some caller tries to setup the worktree, then we'll die
instead. This is indeed information that we can move into the
repository itself.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `enter_repo()` and instead accept the
repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `verify_non_filename()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `verify_filename()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `path_inside_repo()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in `prefix_path()` and instead accept the
repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, `is_inside_work_tree()` determines
whether the current working directory is located inside the worktree of
`the_repository`. Perform the same refactoring by dropping the caching
mechanism and injecting the repository that shall be checked.
Note that, same as in the preceding commit, we're also resolving the
worktree path via `realpath()`. In theory this step is not necessary as
we always set the worktree path via `repo_set_worktree()`, and that
function already resolves the path for us. But resolving the path a
second time is unlikely to matter performance-wise, and it feels fragile
to rely on the repository's worktree path being absolute. We thus
perform the same extra step even though it's ultimately not required.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `is_inside_git_dir()` verifies whether or not the current
working directory is located inside the gitdir of `the_repository`. This
is done by taking the gitdir path and verifying that it's a prefix of
the current working directory.
This information is cached so that we don't have to re-do this change
multiple times. Furthermore, we proactively set the value in multiple
locations so that we don't even have to perform the check when we have
discovered the repository.
While we could simply move the caching variable into the repository, the
current layout doesn't really feel sensible in the first place:
- It can easily lead to false positives or negatives if at any point
in time we may switch the current working directory.
- We don't call the function in a hot loop, and neither is it overly
expensive to compute.
Drop the caching infrastructure and instead compute the property ad-hoc
via an injected repository.
Note that there is one small gotcha: we often end up with relative
gitdir paths, and if so `is_inside_dir()` might fail. This wasn't an
issue before because of how we proactively set the cached value during
repository discovery. Now that we stop doing that it becomes a problem
though, which we work around by resolving the gitdir via `realpath()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git_connect() function takes a "name" argument which is a bit
confusing. It is _not_ the program to run on the remote repo, which is
specified by the "prog" argument. It should instead be one of a few
well-known strings specifying the type of operation (e.g.,
"git-upload-pack"). But to add to the confusion, unless otherwise
configured, those well-known strings will also be the same as the
programs we run, making it easy to mistake which variable is which.
This confusion comes from eaa0fd6584 (git_connect(): fix corner cases in
downgrading v2 to v0, 2023-03-17), though in its defense, the term
"name" and the use of a string are found in other connect code, going
all the way back to b236752a87 (Support remote archive from all smart
transports, 2009-12-09).
But let's see if we can clean things up a bit. The term "name" is overly
vague. We use "service" in other places, including in the smart-http
protocol, so let's use it here, too.
Using a string invites the notion that it can be anything, not one of a
defined set. Let's instead introduce an enum, which has the added bonus
that the compiler can catch typos for us, rather than quietly choosing
the wrong service from an unexpected strcmp() result.
We do still have to turn our enum into those well-known strings to pass
along in the remote-helper protocol (e.g., for a stateless-connect
directive). But now we do so explicitly and in a way that I think is
much more obvious to follow.
This is a pure cleanup; there should be no behavior change.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout -m another-branch" was invented to deal with local
changes to paths that are different between the current and the new
branch, but it gave only one chance to resolve conflicts. The command
was taught to create a stash to save the local changes.
* hn/git-checkout-m-with-stash:
checkout -m: autostash when switching branches
checkout: rollback lock on early returns in merge_working_tree
sequencer: teach autostash apply to take optional conflict marker labels
sequencer: allow create_autostash to run silently
stash: add --label-ours, --label-theirs, --label-base for apply
The 'git backfill' command now rejects revision-limiting options that
are incompatible with its operation, uses standard documentation for
revision ranges, and includes blobs from boundary commits by default
to improve performance of subsequent operations.
* en/backfill-fixes-and-edges:
backfill: default to grabbing edge blobs too
backfill: document acceptance of revision-range in more standard manner
backfill: reject rev-list arguments that do not make sense
The internal URL parsing logic has been made accessible via a new
subcommand "git url-parse".
* mm/git-url-parse:
t9904: add tests for the new url-parse builtin
doc: describe the url-parse builtin
builtin: create url-parse command
urlmatch: define url_parse function
url: return URL_SCHEME_UNKNOWN instead of dying
url: move scheme detection to URL header/source
url: move url_is_local_not_ssh to url.h
connect: rename enum protocol to url_scheme
Refactor service routines in the ref subsystem backends.
* kn/refs-generic-helpers:
refs: use peeled tag values in reference backends
refs: add peeled object ID to the `ref_update` struct
refs: move object parsing to the generic layer
update-ref: handle rejections while adding updates
update-ref: move `print_rejected_refs()` up
refs: return `ref_transaction_error` from `ref_transaction_update()`
refs: extract out reflog config to generic layer
refs: introduce `ref_store_init_options`
refs: remove unused typedef 'ref_transaction_commit_fn'
The `read()` callback used by `struct odb_write_stream` currently
returns a pointer to an internal buffer along with the number of bytes
read. This makes buffer ownership unclear and provides no way to report
errors.
Update the interface to instead require the caller to provide a buffer,
and have the callback return the number of bytes written to it or a
negative value on error. While at it, also move the `struct
odb_write_stream` definition to "odb/streaming.h". Call sites are
updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current ODB transaction interface is colocated with other ODB
interfaces in "odb.{c,h}". Subsequent commits will expand `struct
odb_transaction` to support write operations on the transaction
directly. To keep things organized and prevent "odb.{c,h}" from becoming
more unwieldy, split out `struct odb_transaction` into a separate
header.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In partial clones, `git grep` fetches necessary blobs on-demand one
at a time, which can be very slow. In partial clones, add an extra
preliminary walk over the tree similar to grep_tree() which collects
the blobs of interest, and then prefetches them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In partial clones, `git cherry` fetches necessary blobs on-demand one
at a time, which can be very slow. We would like to prefetch all
necessary blobs upfront. To do so, we need to be able to first figure
out which blobs are needed.
`git cherry` does its work in a two-phase approach: first computing
header-only IDs (based on file paths and modes), then falling back to
full content-based IDs only when header-only IDs collide -- or, more
accurately, whenever the oidhash() of the header-only object_ids
collide.
patch-ids.c handles this by creating an ids->patches hashmap that has
all the data we need, but the problem is that any attempt to query the
hashmap will invoke the patch_id_neq() function on any colliding objects,
which causes the on-demand fetching.
Insert a new prefetch_cherry_blobs() function before checking for
collisions. Use a temporary replacement on the ids->patches.cmpfn
in order to enumerate the blobs that would be needed without yet
fetching them, and then fetch them all at once, then restore the old
ids->patches.cmpfn.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use a larger buffer size in the code paths to ingest pack stream.
* sb/unpack-index-pack-buffer-resize:
index-pack, unpack-objects: increase input buffer from 4 KiB to 128 KiB
"git history" learned "fixup" command.
* ps/history-fixup:
builtin/history: introduce "fixup" subcommand
builtin/history: generalize function to commit trees
replay: allow callers to control what happens with empty commits
Update code paths that assumed "unsigned long" was long enough for
"size_t".
* js/objects-larger-than-4gb-on-windows:
ci: run expensive tests on push builds to integration branches
t5608: mark >4GB tests as EXPENSIVE
test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
test-tool synthesize: precompute pack for 4 GiB + 1
test-tool synthesize: use the unsafe hash for speed
t5608: add regression test for >4GB object clone
test-tool: add a helper to synthesize large packfiles
delta, packfile: use size_t for delta header sizes
odb, packfile: use size_t for streaming object sizes
git-zlib: handle data streams larger than 4GB
index-pack, unpack-objects: use size_t for object size
The octopus merge path checks whether each remote head is already
an ancestor of HEAD by computing all merge-bases via
repo_get_merge_bases() and comparing the first result's OID to
the remote head. This is more expensive than necessary:
repo_get_merge_bases() calls paint_down_to_common() with
min_generation=0, performs the full STALE drain, and may run
remove_redundant(), when all we need is a yes/no reachability
answer.
Replace this with repo_in_merge_bases(), which answers the
is-ancestor question directly. When generation numbers are
available, repo_in_merge_bases() uses can_all_from_reach() -- a
DFS bounded by generation number that stops as soon as the target
is found or ruled out, without entering paint_down_to_common() at
all. Without generation numbers, it still benefits from a tighter
min_generation floor.
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new builtin "git format-rev" is introduced for pretty formatting
one revision expression per line or commit object names found in
running text.
* kh/name-rev-custom-format:
format-rev: introduce builtin for on-demand pretty formatting
name-rev: make dedicated --annotate-stdin --name-only test
name-rev: factor code for sharing with a new command
name-rev: run clang-format before factoring code
name-rev: wrap both blocks in braces
To help Windows 10 installations, avoid removing files whose
contents are still mmap()'ed.
* js/maintenance-fix-deadlock-on-win10:
maintenance(geometric): do release the `.idx` files before repacking
mingw: optionally use legacy (non-POSIX) delete semantics
Commits not in the commit-graph get GENERATION_NUMBER_INFINITY and
sort to the top of the priority queue. After those, commits with
finite generation numbers are popped in non-increasing order.
When MERGE_BASE_FIND_ALL is not set the first doubly-painted commit
with a finite generation is therefore a best merge-base: no commit
still in the queue can be a descendant of it. Skip the expensive
STALE drain in this case.
Add MERGE_BASE_FIND_ALL to the merge_base_flags enum. Callers that
need every merge-base (repo_get_merge_bases_many, repo_get_merge_bases,
repo_in_merge_bases_many, remove_redundant_no_gen) pass the flag to
preserve existing behavior. git merge-base (without --all) passes 0,
triggering the early exit.
On a 2.2M-commit merge-heavy monorepo with commit-graph:
HEAD vs ~500: 5,229ms -> 24ms
HEAD vs ~1000: 4,214ms -> 39ms
HEAD vs ~5000: 3,799ms -> 46ms
HEAD vs ~10000: 3,827ms -> 61ms
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>