Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.
Add a separate disable_fscache() macro to make the code clearer and easier
to read.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When I do git fetch, git call file stats under .git/objects for each
refs. This takes time when there are many refs.
By enabling fscache, git takes file stats by directory traversing and that
improved the speed of fetch-pack for repository having large number of
refs.
In my windows workstation, this improves the time of `git fetch` for
chromium repository like below. I took stats 3 times.
* With this patch
TotalSeconds: 9.9825165
TotalSeconds: 9.1862075
TotalSeconds: 10.1956256
Avg: 9.78811653333333
* Without this patch
TotalSeconds: 15.8406702
TotalSeconds: 15.6248053
TotalSeconds: 15.2085938
Avg: 15.5580231
Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The lazy priority queue optimization pattern (deferring actual removal
in prio_queue_get() to allow get+put fusion) has been folded directly
into prio_queue itself, speeding up commit traversal workflows and
simplifying callers.
* kk/prio-queue-get-put-fusion:
prio-queue: fold lazy_queue into prio_queue for automatic get+put fusion
prio-queue: rename .nr to .nr_ and add accessor helpers
Rename the .nr member to .nr_ so that callers outside prio-queue.c
that directly reference .nr get a compilation error. This catches
both existing misuse and future in-flight topics.
Add prio_queue_size() for callers that need to know the element count
and prio_queue_for_each() for callers that need to walk all elements.
Convert all external .nr users:
- Loop conditions: use prio_queue_size(), prio_queue_get(), or
prio_queue_peek() as the loop condition
- Array iterations: use prio_queue_for_each()
Signed-off-by: Kristofer Karlsson <krka@spotify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes, it is beneficial to retrieve information about an object
without downloading it entirely. The server-side logic for this
functionality was implemented in commit "a2ba162cda (object-info:
support for retrieving object info, 2021-04-20)." And the wire
format is documented at
https://git-scm.com/docs/protocol-v2#_object_info.
This commit introduces client functions to interact with the server.
Currently, the client supports requesting a list of object IDs with
the 'size' feature from a v2 server. If the server does not advertise
this feature (i.e., transfer.advertiseobjectinfo is set to false),
the client will return an error and exit.
Notice that the entire request is written into req_buf before being
sent to the remote. This approach follows the pattern used in the
`send_fetch_request()` logic within fetch-pack.c.
Streaming the request is not addressed in this patch.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some variables initialized at the start of the
do_fetch_pack_v2() state machine. Currently, they are initialized
in FETCH_CHECK_LOCAL, which is the initial state set at the beginning
of the function.
However, a subsequent patch will allow for another initial state,
while still requiring these initialized variables.
Move the initialization to be before the state machine,
so that they are set regardless of the initial state.
Note that there is no change in behavior, because we're moving code
from the beginning of the first state to just before the execution of
the state machine.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `write_fetch_command_and_capabilities()`, enabling it to serve
both fetch and additional commands.
In this context, "command" refers to the "operations" supported by
Git's wire protocol https://git-scm.com/docs/protocol-v2, such as a Git
subcommand (e.g., git-fetch(1)) or a server-side operation like
"object-info" as implemented in commit a2ba162
(object-info: support for retrieving object info, 2021-04-20).
Refactor the function signature to accept a command instead of the
hardcoded "fetch".
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_fetch_command_and_capabilities will be refactored in a subsequent
commit where it will become a more general-purpose function, making it
more accessible to additional commands in the future.
To move `write_fetch_command_and_capabilities()` to `connect.c`, we need
to adjust how `advertise_sid` is managed. Previously in `fetch_pack.c`,
`advertise_sid` was a static variable, modified using
`repo_config_get_bool()`.
In `connect.c`, we now initialize `advertise_sid` at the begining by
directly using `repo_config_get_bool()`. This change is safe because:
In the original `fetch-pack.c` code, there are only two places that write
`advertise_sid`:
1. In function `do_fetch_pack()`:
if (!sever_supports("session_id"))
advertise_sid = 0;
2. In function `fetch_pack_config()`:
repo_config_get_bool("transfer.advertisesid", &advertise_sid);
About 1, since `do_fetch_pack()` is only relevant for protocol v1, this
assignment can be ignored, as `write_fetch_command_and_capabilities()`
is only used in v2.
About 2, `repo_config_get_bool()` is from `config.h` and it's an out-of-box
dependency of `connect.c`, so we can reuse it directly.
Move `write_fetch_command_and_capabilities()` to `connect.c`
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some code used in this series declares variable i and only uses it
in a for loop, not in any other logic outside the loop.
Change the declaration of i to be inside the for loop for readability.
While at it, we also change its type from "int" to "size_t" where the latter makes more sense.
Helped-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Eric Ju <eric.peijian@gmail.com>
Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The negotiation tip options in "git fetch" have been reworked to
allow requiring certain refs to be sent as "have" lines, and to
restrict negotiation to a specific set of refs.
* ds/fetch-negotiation-options:
send-pack: pass negotiation config in push
remote: add remote.*.negotiationInclude config
fetch: add --negotiation-include option for negotiation
negotiator: add have_sent() interface
remote: add remote.*.negotiationRestrict config
transport: rename negotiation_tips
fetch: add --negotiation-restrict option
t5516: fix test order flakiness
Add a new --negotiation-include option to 'git fetch', which ensures
that certain ref tips are always sent as 'have' lines during fetch
negotiation, regardless of what the negotiation algorithm selects.
This is useful when the repository has a large number of references, so
the normal negotiation algorithm truncates the list. This is especially
important in repositories with long parallel commit histories. For
example, a repo could have a 'dev' branch for development and a
'release' branch for released versions. If the 'dev' branch isn't
selected for negotiation, then it's not a big deal because there are
many in-progress development branches with a shared history. However, if
'release' is not selected for negotiation, then the server may think
that this is the first time the client has asked for that reference,
causing a full download of its parallel commit history (and any extra
data that may be unique to that branch). This is based on a real example
where certain fetches would grow to 60+ GB when a release branch
updated.
This option is a complement to --negotiation-restrict, which reduces the
negotiation ref set to a specific list. In the earlier example, using
--negotiation-restrict to focus the negotiation to 'dev' and 'release'
would avoid those problematic downloads, but would still not allow
advertising potentially-relevant user branches. In this way, the
'include' version solves the problem I mention while allowing
negotiation to pick other references opportunistically. The two options
can also be combined to allow the best of both worlds.
The argument may be an exact ref name or a glob pattern. Non-existent
refs are silently ignored. This behavior is also updated in the ref matching
logic for the related --negotiation-restrict option to match.
The implementation outputs the requested objects as haves before the
negotiator performs its own algorithm to choose the next haves. Use the new
have_sent() interface to signal these have commits were sent before engaging
with the negotiator's next() iterator.
Also add --negotiation-include to 'git pull' passthrough options.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change added the --negotiation-restrict synonym for the
--negotiation-tip option for 'git fetch'. In anticipation of adding a new
option that behaves similarly but with distinct changes to its behavior,
rename the internal representation of this data from 'negotiation_tips' to
'negotiation_restrict_tips'.
The 'tips' part is kept because this is an oid_array in the transport layer.
This requires the builtin to handle parsing refs into collections of oids so
the transport layer can handle this cleaner form of the data.
Also update the string_list used to store the inputs from command-line
options.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various code clean-up around odb subsystem.
* ps/odb-cleanup:
odb: drop unneeded headers and forward decls
odb: rename `odb_has_object()` flags
odb: use enum for `odb_write_object` flags
odb: rename `odb_write_object()` flags
treewide: use enum for `odb_for_each_object()` flags
CodingGuidelines: document our style for flags
Internals of "git fsck" have been refactored to not depend on the
global `the_repository` variable.
* ps/fsck-wo-the-repository:
builtin/fsck: stop using `the_repository` in error reporting
builtin/fsck: stop using `the_repository` when marking objects
builtin/fsck: stop using `the_repository` when checking packed objects
builtin/fsck: stop using `the_repository` with loose objects
builtin/fsck: stop using `the_repository` when checking reflogs
builtin/fsck: stop using `the_repository` when checking refs
builtin/fsck: stop using `the_repository` when snapshotting refs
builtin/fsck: fix trivial dependence on `the_repository`
fsck: drop USE_THE_REPOSITORY
fsck: store repository in fsck options
fsck: initialize fsck options via a function
fetch-pack: move fsck options into function scope
Rename `odb_has_object()` flags to be properly prefixed with the
function name.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch that simplifies the code by letting
strvec_pushv() append the items of a second strvec instead of pushing
them one by one.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fsck subsystem relies on `the_repository` quite a bit. While we
could of course explicitly pass a repository down the callchain, we
already have a `struct fsck_options` that we pass to almost all
functions.
Extend the options to also store the repository to make it readily
available.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We initialize the `struct fsck_options` via a set of macros, often in
global scope. In the next commit though we're about to introduce a new
repository field to the options that must be initialized, and naturally
we don't have a repo other than `the_repository` available in this
scope.
Refactor the code to instead intrdouce a new `fsck_options_init()`
function that initializes the options for us and move initialization
into function scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching a packfile, we optionally verify received objects via the
fsck subsystem. The options for those consistency checks are declared in
global scope without a good reason, and they are never cleaned up. So in
case the options are reused, they may accumulate more state over time.
Furthermore, in subsequent changes we'll introduce a repository pointer
into the structure. Obviously though, we don't have a repository
available at static time, except for `the_repository`, which we don't
want to use here.
Refactor the code to move the options into the respective functions and
properly manage their lifecycle.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace calls to `refs_for_each_rawref()` with the newly introduced
`refs_for_each_ref_ext()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous commits have set up an infrastructure for `--filter=auto` to
automatically prepare a partial clone filter based on what the server
advertised and the client accepted.
Using that infrastructure, let's now enable the `--filter=auto` option
in `git clone` and `git fetch` by setting `allow_auto_filter` to 1.
Note that these small changes mean that when `git clone --filter=auto`
or `git fetch --filter=auto` are used, "auto" is automatically saved
as the partial clone filter for the server on the client. Therefore
subsequent calls to `git fetch` on the client will automatically use
this "auto" mode even without `--filter=auto`.
Let's also set `allow_auto_filter` to 1 in `transport.c`, as the
transport layer must be able to accept the "auto" filter spec even if
the invoking command hasn't fully parsed it yet.
When an "auto" filter is requested, let's have the "fetch-pack.c" code
in `do_fetch_pack_v2()` compute a filter and send it to the server.
In `do_fetch_pack_v2()` the logic also needs to check for the
"promisor-remote" capability and call `promisor_remote_reply()` to
parse advertised remotes and populate the list of those accepted (and
their filters).
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git config get --path" segfaulted on an ":(optional)path" that
does not exist, which has been corrected.
* jc/optional-path:
config: really treat missing optional path as not configured
config: really pretend missing :(optional) value is not there
config: mark otherwise unused function as file-scope static
These callers expect that git_config_pathname() that returns 0 is a
signal that the variable they passed has a string they need to act
on. But with the introduction of ":(optional)path" earlier, that is
no longer the case. If the path specified by the configuration
variable is missing, their variable will get a NULL in it, and they
need to act on it (often, just refraining from copying it elsewhere).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.
It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.
One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.
Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.
[1]: <ZmarVcF5JjsZx0dl@tanuki>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reprepare_packed_git()` we perform a couple of operations:
- We reload alternate object directories.
- We clear the loose object cache.
- We reprepare packfiles.
While the logic is hosted in "packfile.c", it clearly reaches into other
subsystems that aren't related to packfiles.
Split up the responsibility and introduce `odb_reprepare()` which now
becomes responsible for repreparing the whole object database. The
existing `reprepare_packed_git()` function is refactored accordingly and
only cares about reloading the packfile store now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under a race against another process that is repacking the
repository, especially a partially cloned one, "git fetch" may
mistakenly think some objects we do have are missing, which has
been corrected.
* jk/fetch-check-graph-objects-fix:
fetch-pack: re-scan when double-checking graph objects
The fetch code tries to avoid asking the remote side for an object we
already have. It does this by traversing recent commits reachable from
our refs looking for matches. Commit 5d4cc78f72 (fetch-pack: die if in
commit graph but not obj db, 2024-11-05) introduced an extra check
there: if we think we have an object because it's in the commit graph,
we double-check that we actually have it in our object database with a
call to odb_has_object().
But that call does not pass any flags, and so the function won't call
reprepared_packed_git() if it does not find the object. That opens us up
to the usual race against some other process repacking the odb:
1. We scan the list of packs in objects/pack but haven't yet opened them.
2. Somebody else packs the object into a new pack (which we don't know
about), and deletes the old pack it was in.
3. Our odb_has_object() calls tries to open that old pack, but finds it
is gone. We declare that we don't have the object.
And this causes us to erroneously complain and abort the fetch, thinking
our commit-graph and object database are out of sync. Instead, we should
pass HAS_OBJECT_RECHECK_PACKED, which will add a new step:
4. We re-scan the pack directory again, find the new pack, and locate
the object.
Often the fetch code tries to avoid these kinds of re-scans if it's
likely that we won't have the object. If the other side has told us
about object X and we want to know if we have it, we'll skip the re-scan
(to avoid spending a lot of effort when there are many such objects). We
can accept the racy false negative in that case because the worst case
is that we ask the other side to send us the object.
But this is not one of those cases. These are objects which are
accessible from _our_ refs, and which we already found in the commit
graph file. We should have them, and if we don't, we'll die()
immediately. So the performance impact is negligible, and getting the
right answer is important.
There's no test here because it's inherently racy. In fact, I had
trouble even developing a minimal test. The problem seen in the wild can
be produced like this:
# Any git.git mirror which supports partial clones; I think this
# should work with any repo that contains submodules, but note that
# $obj below is specific to this repo
url=https://github.com/git/git.git
# This is a commit that is not at the tip of any branches (so after
# we have it, we'll still have some commits to fetch).
obj=cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
git init
git fetch --filter=tree:0 $url $obj:refs/heads/foo
git checkout foo
git commit-graph write --reachable
git fetch $url
What happens here is that the initial fetch grabs that older commit (and
its ancestors) but no trees or blobs, and the subsequent checkout grabs
the necessary trees and blobs just for that commit. The final fetch
spawns a long sequence of child fetches due to fetch_submodules(), which
wants to check whether there have been any gitlink modifications which
should trigger a fetch of the related submodule (we'll leave aside the
irony that we did not even check out any submodules yet).
That series of fetches causes us to accumulate packs, which eventually
triggers background maintenance to run. That repacks all-into-one, and
the pack containing $obj goes away in favor of a new pack. And then the
fetch eventually fails with:
fatal: You are attempting to fetch cf6f63ea6b, which is in the commit graph file but
not in the object database.
In the scenario above, the race becomes likely because of the long
series of quick fetches. But I _think_ the bug is independent of partial
clones entirely, and you could run into the same thing with a single
fetch, some other process running "git repack" simultaneously, and a bit
of bad luck. I haven't been able to reproduce, though. I'm not sure if
that's because there's some mis-analysis above, or if the race window is
just small enough that it's hard to trigger.
At any rate, re-scanning here seems like an obviously correct thing to
do with no downside, and it does fix the partial-clone case shown above.
Reported-by: Дилян Палаузов <dilyan.palauzov@aegee.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
string_list_split*() family of functions have been extended to
simplify common use cases.
* jc/string-list-split:
string-list: split-then-remove-empty can be done while splitting
string-list: optionally omit empty string pieces in string_list_split*()
diff: simplify parsing of diff.colormovedws
string-list: optionally trim string pieces split by string_list_split*()
string-list: unify string_list_split* functions
string-list: align string_list_split() with its _in_place() counterpart
string-list: report programming error with BUG
The config API had a set of convenience wrapper functions that
implicitly use the_repository instance; they have been removed and
inlined at the calling sites.
* ps/config-wo-the-repository: (21 commits)
config: fix sign comparison warnings
config: move Git config parsing into "environment.c"
config: remove unused `the_repository` wrappers
config: drop `git_config_set_multivar()` wrapper
config: drop `git_config_get_multivar_gently()` wrapper
config: drop `git_config_set_multivar_in_file_gently()` wrapper
config: drop `git_config_set_in_file_gently()` wrapper
config: drop `git_config_set()` wrapper
config: drop `git_config_set_gently()` wrapper
config: drop `git_config_set_in_file()` wrapper
config: drop `git_config_get_bool()` wrapper
config: drop `git_config_get_ulong()` wrapper
config: drop `git_config_get_int()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string_multi()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get()` wrapper
config: drop `git_config_clear()` wrapper
...
The string_list_split_in_place() function was updated by 52acddf3
(string-list: multi-delimiter `string_list_split_in_place()`,
2023-04-24) to take more than one delimiter characters, hoping that
we can later use it to replace our uses of strtok(). We however did
not make a matching change to the string_list_split() function,
which is very similar.
Before giving both functions more features in future commits, allow
string_list_split() to also take more than one delimiter characters
to make them closer to each other.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.
* rs/pop-recent-commit-with-prio-queue:
commit: use prio_queue_replace() in pop_most_recent_commit()
prio-queue: add prio_queue_replace()
commit: convert pop_most_recent_commit() to prio_queue
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_bool()`. All
callsites are adjusted so that they use
`repo_config_get_bool(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_int()`. All
callsites are adjusted so that they use
`repo_config_get_int(the_repository, ...)` instead. While some callsites
might already have a repository available, this mechanical conversion is
the exact same as the current situation and thus cannot cause any
regression. Those sites should eventually be cleaned up in a later patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_string()`.
All callsites are adjusted so that they use
`repo_config_get_string(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a106 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pop_most_recent_commit() calls commit_list_insert_by_date() for parent
commits, which is itself called in a loop. This can lead to quadratic
complexity if there are many merges. Replace the commit_list with a
prio_queue to ensure logarithmic worst case complexity and convert all
three users.
Add a performance test that exercises one of them using a pathological
history that consists of 50% merges and 50% root commits to demonstrate
the speedup:
Test v2.50.1 HEAD
----------------------------------------------------------------------
1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9%
Alas, sane histories don't benefit from the conversion much, and
traversing Git's own history takes a 1% performance hit on my machine:
$ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s]
Range (min … max): 1.067 s … 1.078 s 10 runs
Benchmark 2: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s]
Range (min … max): 1.074 s … 1.083 s 10 runs
Summary
./git_2.50.1 rev-parse :/^Initial.revision ran
1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to flip the default hash function to SHA-256.
* bc/use-sha256-by-default-in-3.0:
Enable SHA-256 by default in breaking changes mode
help: add a build option for default hash
t5300: choose the built-in hash outside of a repo
t4042: choose the built-in hash outside of a repo
t1007: choose the built-in hash outside of a repo
t: default to compile-time default hash if not set
setup: use the default algorithm to initialize repo format
Use legacy hash for legacy formats
builtin: use default hash when outside a repository
hash: add a constant for the legacy hash algorithm
hash: add a constant for the default hash algorithm
* bc/use-sha256-by-default-in-3.0:
Enable SHA-256 by default in breaking changes mode
help: add a build option for default hash
t5300: choose the built-in hash outside of a repo
t4042: choose the built-in hash outside of a repo
t1007: choose the built-in hash outside of a repo
t: default to compile-time default hash if not set
setup: use the default algorithm to initialize repo format
Use legacy hash for legacy formats
builtin: use default hash when outside a repository
hash: add a constant for the legacy hash algorithm
hash: add a constant for the default hash algorithm
We have a large variety of data formats and protocols where no hash
algorithm was defined and the default was assumed to always be SHA-1.
Instead of explicitly stating SHA-1, let's use the constant to represent
the legacy hash algorithm (which is still SHA-1) so that it's clear
for documentary purposes that it's a legacy fallback option and not an
intentional choice to use SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of iterator-style functions that execute a callback
for each instance of a given set, all of which currently depend on
`the_repository`. Refactor them to instead take an object database as
parameter so that we can get rid of this dependency.
Rename the functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:
- The new function has a short-and-sweet name.
- More explicit defaults: `has_object()` doesn't fetch missing objects
via promisor remotes, and neither does it reload packfiles if an
object wasn't found by default. This ensures that it becomes
immediately obvious when a simple object existence check may result
in expensive actions.
Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.
Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:
- `repo_has_object_file(...)` is equivalent to
`has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., 0)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.
The replacements should be functionally equivalent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_pack_lockfile()` function uses the global `the_repository`
variable to access the repository. To avoid global variable usage, pass
the repository from the layers above.
Altough the layers above could have access to the repository internally,
simply pass in `the_repository`. This avoids any compatibility issues
and bubbles up global variable usage to upper layers which can be
eventually resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git, fsck operations can ignore known broken objects via the
`fsck.skipList` configuration. This option expects a path to a file with
the list of object names. When the configuration is specified without a
path, an error message is printed, but the command continues as if the
configuration was not set. Configuring `fsck.skipList` without a value
is a misconfiguration so config parsing should be more strict and reject
it.
Update `git_fsck_config()` to no longer ignore misconfiguration of
`fsck.skipList`. The same behavior is also present for
`fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration
parsers for these are updated to ensure the related operations remain
consistent.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings