git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-04-09 23:33:34 -05:00

Author	SHA1	Message	Date
Burak Kaan Karaçay	9df3be8e2e	run-command: wean auto_maintenance() functions off the_repository The prepare_auto_maintenance() relies on the_repository to read configurations. Since run_auto_maintenance() calls prepare_auto_maintenance(), it also implicitly depends the_repository. Add 'struct repository *' as a parameter to both functions and update all callers to pass the_repository. With no global repository dependencies left in this file, remove the USE_THE_REPOSITORY_VARIABLE macro. Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Burak Kaan Karaçay <bkkaracay@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-12 08:30:57 -07:00
Junio C Hamano	f330d46dee	Merge branch 'ar/config-hooks' Allow hook commands to be defined (possibly centrally) in the configuration files, and run multiple of them for the same hook event. * ar/config-hooks: hook: add -z option to "git hook list" hook: allow out-of-repo 'git hook' invocations hook: allow event = "" to overwrite previous values hook: allow disabling config hooks hook: include hooks from the config hook: add "git hook list" command hook: run a list of hooks to prepare for multihook support hook: add internal state alloc/free callbacks	2026-03-10 14:23:18 -07:00
Junio C Hamano	d445aecfb0	Merge branch 'ps/refs-for-each' Code refactoring around refs-for-each-* API functions. * ps/refs-for-each: refs: replace `refs_for_each_fullref_in()` refs: replace `refs_for_each_namespaced_ref()` refs: replace `refs_for_each_glob_ref()` refs: replace `refs_for_each_glob_ref_in()` refs: replace `refs_for_each_rawref_in()` refs: replace `refs_for_each_rawref()` refs: replace `refs_for_each_ref_in()` refs: improve verification for-each-ref options refs: generalize `refs_for_each_fullref_in_prefixes()` refs: generalize `refs_for_each_namespaced_ref()` refs: speed up `refs_for_each_glob_ref_in()` refs: introduce `refs_for_each_ref_ext` refs: rename `each_ref_fn` refs: rename `do_for_each_ref_flags` refs: move `do_for_each_ref_flags` further up refs: move `refs_head_ref_namespaced()` refs: remove unused `refs_for_each_include_root_ref()`	2026-03-09 14:36:55 -07:00
Junio C Hamano	5c56c725f1	Merge branch 'ar/run-command-hook-take-2' Use the hook API to replace ad-hoc invocation of hook scripts via the run_command() API. * ar/run-command-hook-take-2: builtin/receive-pack: avoid spinning no-op sideband async threads receive-pack: convert receive hooks to hook API receive-pack: convert update hooks to new API run-command: poll child input in addition to output hook: add jobs option reference-transaction: use hook API instead of run-command transport: convert pre-push to hook API hook: allow separate std[out\|err] streams hook: convert 'post-rewrite' hook in sequencer.c to hook API hook: provide stdin via callback run-command: add stdin callback for parallelization run-command: add helper for pp child states t1800: add hook output stream tests	2026-03-09 14:36:55 -07:00
Junio C Hamano	ec1c4d974a	Merge branch 'ar/run-command-hook-take-2' into ar/config-hooks * ar/run-command-hook-take-2: builtin/receive-pack: avoid spinning no-op sideband async threads	2026-03-02 16:01:33 -08:00
Adrian Ratiu	005f3fbe07	builtin/receive-pack: avoid spinning no-op sideband async threads Exit early if the hooks do not exist, to avoid spinning up/down sideband async threads which no-op. It is important to call the hook_exists() API provided by hook.[ch] because it covers both config-defined hooks and the "traditional" hooks from the hookdir. find_hook() only covers the hookdir hooks. The regression happened because the no-op async threads add some additional overhead which can be measured with the receive-refs test of the benchmarks suite [1]. Reproduced using: cd benchmarks/receive-refs && \ ./run --revisions /path/to/git \ fc148b146ad41be71a7852c4867f0773cbfe1ff9~,fc148b146ad41be71a7852c4867f0773cbfe1ff9 \ --parameter-list refformat reftable --parameter-list refcount 10000 1: https://gitlab.com/gitlab-org/data-access/git/benchmarks Fixes: `fc148b146a` ("receive-pack: convert update hooks to new API") Reported-by: Patrick Steinhardt <ps@pks.im> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> [jc: avoid duplicated hardcoded hook names] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-03-02 16:00:43 -08:00
Patrick Steinhardt	1dd4f1e43f	refs: replace `refs_for_each_fullref_in()` Replace calls to `refs_for_each_fullref_in()` with the newly introduced `refs_for_each_ref_ext()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-23 13:21:19 -08:00
Adrian Ratiu	ee2fbfd6b2	hook: add internal state alloc/free callbacks Some hooks use opaque structs to keep internal state between callbacks. Because hooks ran sequentially (jobs == 1) with one command per hook, these internal states could be allocated on the stack for each hook run. Next commits add the ability to run multiple commands for each hook, so the states cannot be shared or stored on the stack anymore, especially since down the line we will also enable parallel execution (jobs > 1). Add alloc/free helpers for each hook, doing a "deep" alloc/init & free of their internal opaque struct. The alloc callback takes a context pointer, to initialize the struct at at the time of resource acquisition. These callbacks must always be provided together: no alloc without free and no free without alloc, otherwise a BUG() is triggered. Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-19 13:23:40 -08:00
Junio C Hamano	7855effc95	Merge branch 'cf/c23-const-preserving-strchr-updates-0' ISO C23 redefines strchr and friends that tradiotionally took a const pointer and returned a non-const pointer derived from it to preserve constness (i.e., if you ask for a substring in a const string, you get a const pointer to the substring). Update code paths that used non-const pointer to receive their results that did not have to be non-const to adjust. * cf/c23-const-preserving-strchr-updates-0: gpg-interface: remove an unnecessary NULL initialization global: constify some pointers that are not written to	2026-02-13 13:39:25 -08:00
Junio C Hamano	6176ee2349	Merge branch 'kn/ref-batch-output-error-reporting-fix' A handful of code paths that started using batched ref update API (after Git 2.51 or so) lost detailed error output, which have been corrected. * kn/ref-batch-output-error-reporting-fix: fetch: delay user information post committing of transaction receive-pack: utilize rejected ref error details fetch: utilize rejected ref error details update-ref: utilize rejected error details if available refs: add rejection detail to the callback function refs: skip to next ref when current ref is rejected	2026-02-09 12:09:10 -08:00
Collin Funk	4ac4705afa	global: constify some pointers that are not written to The recent glibc 2.43 release had the following change listed in its NEWS file: For ISO C23, the functions bsearch, memchr, strchr, strpbrk, strrchr, strstr, wcschr, wcspbrk, wcsrchr, wcsstr and wmemchr that return pointers into their input arrays now have definitions as macros that return a pointer to a const-qualified type when the input argument is a pointer to a const-qualified type. When compiling with GCC 15, which defaults to -std=gnu23, this causes many warnings like this: merge-ort.c: In function ‘apply_directory_rename_modifications’: merge-ort.c:2734:36: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers] 2734 \| char *last_slash = strrchr(cur_path, '/'); \| ^~~~~~~ This patch fixes the more obvious ones by making them const when we do not write to the returned pointer. Signed-off-by: Collin Funk <collin.funk1@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-02-05 17:52:49 -08:00
Emily Shaffer	b5e9ad508c	receive-pack: convert receive hooks to hook API This converts the last remaining hooks to the new hook API, for the same benefits as the previous conversions (no need to toggle signals, manage custom struct child_process, call find_hook(), prepares for specifying hooks via configs, etc.). See the previous three commits for a more in-depth explanation of how this all works. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-28 15:47:04 -08:00
Emily Shaffer	fc148b146a	receive-pack: convert update hooks to new API The hook API avoids creating a custom struct child_process and other internal hook plumbing (e.g. calling find_hook()) and prepares for the specification of hooks via configs or running parallel hooks. Execution is still sequential through the run_hooks_opt .jobs == 1, which is the unchanged default for all hooks. When use_sideband==1, the async thread redirects the hook outputs to sideband 2, otherwise it is not used and the hooks write directly to the fds inherited from the main parent process. When .jobs == 1, run-command's poll loop is avoided entirely via the ungroup=1 option like before (this was Jeff's suggestion), achieving the same real-time output performance. When running in parallel, run-command with ungroup=0 will capture and de-interleave the output of each hook, then write to the parent stderr which is redirected via dup2 to the sideband thread, so that each parallel hook output is presented clearly to the client. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-28 15:47:04 -08:00
Karthik Nayak	2ea49f21e3	receive-pack: utilize rejected ref error details In `9d2962a7c4` (receive-pack: use batched reference updates, 2025-05-19), git-receive-pack(1) switched to using batched reference updates. This also introduced a regression wherein instead of providing detailed error messages for failed referenced updates, the users were provided generic error messages based on the error type. Now that the updates also contain detailed error message, propagate those to the client via 'rp_error'. The detailed error messages can be very verbose, for e.g. in the files backend, when trying to write a non-commit object to a branch, you would see: ! [remote rejected] 3eaec9ccf3a53f168362a6b3fdeb73426fb9813d -> branch (cannot update ref 'refs/heads/branch': trying to write non-commit object 3eaec9ccf3a53f168362a6b3fdeb73426fb9813d to branch 'refs/heads/branch') Here the refname is repeated multiple times due to how error messages are propagated and filled over the code stack. This potentially can be cleaned up in a future commit. Reported-by: Elijah Newren <newren@gmail.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-25 22:27:34 -08:00
Karthik Nayak	be54b10fd7	refs: add rejection detail to the callback function The previous commit started storing the rejection details alongside the error code for rejected updates. Pass this along to the callback function `ref_transaction_for_each_rejected_update()`. Currently the field is unused, but will be integrated in the upcoming commits. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2026-01-25 22:27:33 -08:00
Junio C Hamano	a3d1f391d3	Revert "Merge branch 'ar/run-command-hook'" This reverts commit `f406b89552`, reversing changes made to `1627809eef`. It seems to have caused a few regressions, two of the three known ones we have proposed solutions for. Let's give ourselves a bit more room to maneuver during the pre-release freeze period and restart once the 2.53 ships.	2026-01-15 13:02:38 -08:00
Junio C Hamano	f406b89552	Merge branch 'ar/run-command-hook' Use hook API to replace ad-hoc invocation of hook scripts with the run_command() API. * ar/run-command-hook: receive-pack: convert receive hooks to hook API receive-pack: convert update hooks to new API hooks: allow callers to capture output run-command: allow capturing of collated output hook: allow overriding the ungroup option reference-transaction: use hook API instead of run-command transport: convert pre-push to hook API hook: convert 'post-rewrite' hook in sequencer.c to hook API hook: provide stdin via callback run-command: add stdin callback for parallelization run-command: add first helper for pp child states	2026-01-06 16:33:53 +09:00
Emily Shaffer	c65f26fca4	receive-pack: convert receive hooks to hook API This converts the last remaining hooks to the new hook API, for the same benefits as the previous conversions (no need to toggle signals, manage custom struct child_process, call find_hook(), prepares for specifyinig hooks via configs, etc.). I noticed a performance degradation when processing large amounts of hook input with just 1 line per callback, due to run-command's poll loop, therefore I batched 500 lines per callback, to ensure similar pipe throughput as before and to avoid hook child waiting on stdin. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-28 14:02:07 +09:00
Emily Shaffer	0bbaf3653f	receive-pack: convert update hooks to new API Use the new hook sideband API introduced in the previous commit. The hook API avoids creating a custom struct child_process and other internal hook plumbing (e.g. calling find_hook()) and prepares for the specification of hooks via configs or running parallel hooks. Execution is still sequential through the current hook.[ch] via the run_process_parallel_opts.processes=1 arg. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-12-28 14:02:07 +09:00
Junio C Hamano	9d442ce2e2	Merge branch 'ps/object-source-management' Code refactoring around object database sources. * ps/object-source-management: odb: handle recreation of quarantine directories odb: handle changing a repository's commondir chdir-notify: add function to unregister listeners odb: handle initialization of sources in `odb_new()` http-push: stop setting up `the_repository` for each reference t/helper: stop setting up `the_repository` repeatedly builtin/index-pack: fix deferred fsck outside repos oidset: introduce `oidset_equal()` odb: move logic to disable ref updates into repo odb: refactor `odb_clear()` to `odb_free()` odb: adopt logic to close object databases setup: convert `set_git_dir()` to have file scope path: move `enter_repo()` into "setup.c"	2025-12-05 14:49:58 +09:00
Junio C Hamano	0534b78576	Merge branch 'jc/optional-path' "git config get --path" segfaulted on an ":(optional)path" that does not exist, which has been corrected. * jc/optional-path: config: really treat missing optional path as not configured config: really pretend missing :(optional) value is not there config: mark otherwise unused function as file-scope static	2025-12-05 14:49:56 +09:00
Junio C Hamano	0bd16856ff	config: really treat missing optional path as not configured These callers expect that git_config_pathname() that returns 0 is a signal that the variable they passed has a string they need to act on. But with the introduction of ":(optional)path" earlier, that is no longer the case. If the path specified by the configuration variable is missing, their variable will get a NULL in it, and they need to act on it (often, just refraining from copying it elsewhere). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-24 17:00:47 -08:00
Patrick Steinhardt	831e02340b	path: move `enter_repo()` into "setup.c" The function `enter_repo()` is used to enter a repository at a given path. As such it sits way closer to setting up a repository than it does with handling paths, but regardless of that it's located in "path.c" instead of in "setup.c". Move the function into "setup.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-19 17:41:03 -08:00
Patrick Steinhardt	bdbebe5714	refs: introduce wrapper struct for `each_ref_fn` The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is `e8207717f1` (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-11-04 07:32:24 -08:00
Patrick Steinhardt	78237ea53d	packfile: split up responsibilities of `reprepare_packed_git()` In `reprepare_packed_git()` we perform a couple of operations: - We reload alternate object directories. - We clear the loose object cache. - We reprepare packfiles. While the logic is hosted in "packfile.c", it clearly reaches into other subsystems that aren't related to packfiles. Split up the responsibility and introduce `odb_reprepare()` which now becomes responsible for repreparing the whole object database. The existing `reprepare_packed_git()` function is refactored accordingly and only cares about reloading the packfile store now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-09-24 11:53:50 -07:00
Junio C Hamano	4ce0caa7cc	Merge branch 'ps/object-file-wo-the-repository' Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings	2025-08-05 11:53:55 -07:00
Patrick Steinhardt	9ce196e86b	config: drop `git_config()` wrapper In `036876a106` (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-23 08:15:18 -07:00
Junio C Hamano	86c9c14eb9	Merge branch 'bc/use-sha256-by-default-in-3.0' into ps/config-wo-the-repository * bc/use-sha256-by-default-in-3.0: Enable SHA-256 by default in breaking changes mode help: add a build option for default hash t5300: choose the built-in hash outside of a repo t4042: choose the built-in hash outside of a repo t1007: choose the built-in hash outside of a repo t: default to compile-time default hash if not set setup: use the default algorithm to initialize repo format Use legacy hash for legacy formats builtin: use default hash when outside a repository hash: add a constant for the legacy hash algorithm hash: add a constant for the default hash algorithm	2025-07-17 09:30:56 -07:00
Patrick Steinhardt	ab1c6e1d12	odb: introduce `odb_write_object()` We do not have a backend-agnostic way to write objects into an object database. While there is `write_object_file()`, this function is rather specific to the loose object format. Introduce `odb_write_object()` to plug this gap. For now, this function is a simple wrapper around `write_object_file()` and doesn't even use the passed-in object database yet. This will change in subsequent commits, where `write_object_file()` is converted so that it works on top of an `odb_source`. `odb_write_object()` will then become responsible for deciding which source an object shall be written to. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-16 22:16:15 -07:00
Junio C Hamano	51b50c55a9	Merge branch 'ps/object-store' Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`	2025-07-15 15:18:18 -07:00
Junio C Hamano	7cafb9accc	Merge branch 'ps/object-store' into ps/object-file-wo-the-repository * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`	2025-07-09 16:29:52 -07:00
Junio C Hamano	cdb7872247	Merge branch 'kn/fetch-push-bulk-ref-update' "git push" and "git fetch" are taught to update refs in batches to gain performance. * kn/fetch-push-bulk-ref-update: receive-pack: handle reference deletions separately refs/files: skip updates with errors in batched updates receive-pack: use batched reference updates send-pack: fix memory leak around duplicate refs fetch: use batched reference updates refs: add function to translate errors to strings	2025-07-08 15:49:19 -07:00
brian m. carlson	667d251a04	Use legacy hash for legacy formats We have a large variety of data formats and protocols where no hash algorithm was defined and the default was assumed to always be SHA-1. Instead of explicitly stating SHA-1, let's use the constant to represent the legacy hash algorithm (which is still SHA-1) so that it's clear for documentary purposes that it's a legacy fallback option and not an intentional choice to use SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:58:24 -07:00
Patrick Steinhardt	fcf8e3e111	odb: rename `has_object()` Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:38 -07:00
Patrick Steinhardt	798c661ce3	odb: get rid of `the_repository` in `for_each()` functions There are a couple of iterator-style functions that execute a callback for each instance of a given set, all of which currently depend on `the_repository`. Refactor them to instead take an object database as parameter so that we can get rid of this dependency. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:36 -07:00
Patrick Steinhardt	8f49151763	object-store: rename files to "odb.{c,h}" In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-07-01 14:46:34 -07:00
Karthik Nayak	5c697f0b7d	receive-pack: handle reference deletions separately In `9d2962a7c4` (receive-pack: use batched reference updates, 2025-05-19) we updated the 'git-receive-pack(1)' command to use batched reference updates. One edge case which was missed during this implementation was when a user pushes multiple branches such as: delete refs/heads/branch/conflict create refs/heads/branch Before using batched updates, the references would be applied sequentially and hence no conflicts would arise. With batched updates, while the first update applies, the second fails due to D/F conflict. A similar issue was present in 'git-fetch(1)' and was fixed by separating out reference pruning into a separate transaction in the commit 'fetch: use batched reference updates'. Apply a similar mechanism for 'git-receive-pack(1)' and separate out reference deletions into its own batch. This means 'git-receive-pack(1)' will now use up to two transactions, whereas before using batched updates it would use _at least_ two transactions. So using batched updates is still the better option. Add a test to validate this behavior. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-25 08:20:27 -07:00
Justin Tobler	68cb0b5253	builtin/receive-pack: add option to skip connectivity check During git-receive-pack(1), connectivity of the object graph is validated to ensure that the received packfile does not leave the repository in a broken state. This is done via git-rev-list(1) and walking the objects, which can be expensive for large repositories. Generally, this check is critical to avoid an incomplete received packfile from corrupting a repository. Server operators may have additional knowledge though around exactly how Git is being used on the server-side which can be used to facilitate more efficient connectivity computation of incoming objects. For example, if it can be ensured that all objects in a repository are connected and do not depend on any missing objects, the connectivity of newly written objects can be checked by walking the object graph containing only the new objects from the updated tips and identifying the missing objects which represent the boundary between the new objects and the repository. These boundary objects can be checked in the canonical repository to ensure the new objects connect as expected and thus avoid walking the rest of the object graph. Git itself cannot make the guarantees required for such an optimization as it is possible for a repository to contain an unreachable object that references a missing object without the repository being considered corrupt. Introduce the --skip-connectivity-check option for git-receive-pack(1) which bypasses this connectivity check to give more control to the server-side. Note that without proper server-side validation of newly received objects handled outside of Git, usage of this option risks corrupting a repository. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-20 11:43:36 -07:00
Karthik Nayak	9d2962a7c4	receive-pack: use batched reference updates The reference updates performed as a part of 'git-receive-pack(1)', take place one at a time. For each reference update, a new transaction is created and committed. This is necessary to ensure we can allow individual updates to fail without failing the entire command. The command also supports an 'atomic' mode, which uses a single transaction to update all of the references. But this mode has an all-or-nothing approach, where if a single update fails, all updates would fail. In `23fc8e4f61` (refs: implement batch reference update support, 2025-04-08), we introduced a new mechanism to batch reference updates. Under the hood, this uses a single transaction to perform a batch of reference updates, while allowing only individual updates to fail. Utilize this newly introduced batch update mechanism in 'git-receive-pack(1)'. This provides a significant bump in performance, especially when dealing with repositories with large number of references. With the reftable backend there is a 18x performance improvement, when performing receive-pack with 10000 refs: Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master) Time (mean ± σ): 4.276 s ± 0.078 s [User: 0.796 s, System: 3.318 s] Range (min … max): 4.185 s … 4.430 s 10 runs Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) Time (mean ± σ): 235.4 ms ± 6.9 ms [User: 75.4 ms, System: 157.3 ms] Range (min … max): 228.5 ms … 254.2 ms 11 runs Summary receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran 18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master) In similar conditions, the files backend sees a 1.21x performance improvement: Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master) Time (mean ± σ): 1.121 s ± 0.021 s [User: 0.128 s, System: 0.975 s] Range (min … max): 1.097 s … 1.156 s 10 runs Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD) Time (mean ± σ): 927.9 ms ± 22.6 ms [User: 99.0 ms, System: 815.2 ms] Range (min … max): 903.1 ms … 978.0 ms 10 runs Summary receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran 1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master) As using batched updates requires the error handling to be moved to the end of the flow, create and use a 'struct strset' to track the failed refs and attribute the correct errors to them. This change also uncovers an issue when a client provides multiple updates to the same reference. For example: $ git send-pack remote.git A:foo B:foo Enumerating objects: 3, done. Counting objects: 100% (3/3), done. Delta compression using up to 20 threads Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 226 bytes \| 226.00 KiB/s, done. Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0) remote: error: cannot lock ref 'refs/heads/foo': reference already exists To remote.git ! [remote rejected] A -> foo (failed to update ref) ! [remote failure] B -> foo (remote failed to report status) As you can see, the remote runs into an error because it cannot lock the target reference for the second update. Furthermore, the remote complains that the first update has been rejected whereas the second update didn't receive any status update because we failed to lock it. Reading this status message alone a user would probably expect that `foo` has not been updated at all. But that's not the case: while we claim that the ref wasn't updated, it surprisingly points to `A` now. One could argue that this is merely an error in how we report the result of this push. But ultimately, the user's request itself is already broken and doesn't make any sense in the first place and cannot ever lead to a sensible outcome that honors the full request. The conversion to batched transactions fixes the issue because we now try to queue both updates in the same transaction. As such, the transaction itself will notice this conflict and refuse the update altogether before we commit any of the values. Note that this requires changes to a couple of tests in t5408 that happened to exercise this behaviour. Given that the generated output is misleading and given that the user request cannot ever be fully honored this really feels more like a bug than properly designed behaviour. As such, changing the behaviour feels like the right thing to do. Since now reference updates are batched, the 'reference-transaction' hook will be invoked with all updates together. Currently git will 'die' when the hook returns with a non-zero exit status in the 'prepared' stage. For 'git-receive-pack(1)', this allowed users to reject an individual reference update, git would have applied previous updates but immediately abort further execution. This is definitely an incorrect usage of this hook, since the right place to do this would be the 'update' hook. This patch retains the latter behavior, but 'reference-transaction' hook now changes to a all-or-nothing behavior when a non-zero exit status is returned in the 'prepared' stage, since batch updates use a transaction under the hood. This explains the change in 't1416'. Helped-by: Jeff King <peff@peff.net> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-19 11:06:32 -07:00
Patrick Steinhardt	062b914c84	treewide: convert users of `repo_has_object_file()` to `has_object()` As the comment of `repo_has_object_file()` and its `_with_flags()` variant tells us, these functions are considered to be deprecated in favor of `has_object()`. There are a couple of slight benefits in favor of the replacement: - The new function has a short-and-sweet name. - More explicit defaults: `has_object()` doesn't fetch missing objects via promisor remotes, and neither does it reload packfiles if an object wasn't found by default. This ensures that it becomes immediately obvious when a simple object existence check may result in expensive actions. Most importantly though, it is confusing that we have two sets of functions that ultimately do the same thing, but with different defaults. Start sunsetting `repo_has_object_file()` and its `_with_flags()` sibling by replacing all callsites with `has_object()`: - `repo_has_object_file(...)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED \| HAS_OBJECT_FETCH_PROMISOR)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK \| OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., 0)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)` is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`. The replacements should be functionally equivalent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-29 10:08:13 -07:00
Patrick Steinhardt	68cd492a3e	object-store: merge "object-store-ll.h" and "object-store.h" The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:37 -07:00
Patrick Steinhardt	d9f517d051	object-file: split out functions relating to object store subsystem While we have the "object-store.h" header, most of the functionality for object stores is actually hosted in "object-file.c". This makes it hard to find relevant functions and causes us to mix up concerns. Split out functions relating to the object store subsystem into a new "object-store.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-04-15 08:24:36 -07:00
Patrick Steinhardt	7d70b29c4f	hash: stop depending on `the_repository` in `null_oid()` The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-03-10 13:16:20 -07:00
Junio C Hamano	feffb34257	Merge branch 'ps/path-sans-the-repository' The path.[ch] API takes an explicit repository parameter passed throughout the callchain, instead of relying on the_repository singleton instance. * ps/path-sans-the-repository: path: adjust last remaining users of `the_repository` environment: move access to "core.sharedRepository" into repo settings environment: move access to "core.hooksPath" into repo settings repo-settings: introduce function to clear struct path: drop `git_path()` in favor of `repo_git_path()` rerere: let `rerere_path()` write paths into a caller-provided buffer path: drop `git_common_path()` in favor of `repo_common_path()` worktree: return allocated string from `get_worktree_git_dir()` path: drop `git_path_buf()` in favor of `repo_git_path_replace()` path: drop `git_pathdup()` in favor of `repo_git_path()` path: drop unused `strbuf_git_path()` function path: refactor `repo_submodule_path()` family of functions submodule: refactor `submodule_to_gitdir()` to accept a repo path: refactor `repo_worktree_path()` family of functions path: refactor `repo_git_path()` family of functions path: refactor `repo_common_path()` family of functions	2025-03-05 10:37:43 -08:00
Junio C Hamano	246569bf83	Merge branch 'ps/hash-cleanup' Further code clean-up on the use of hash functions. Now the context object knows what hash function it is working with. * ps/hash-cleanup: global: adapt callers to use generic hash context helpers hash: provide generic wrappers to update hash contexts hash: stop typedeffing the hash context hash: convert hashing context to a structure	2025-02-10 10:18:31 -08:00
Patrick Steinhardt	8e4710f011	worktree: return allocated string from `get_worktree_git_dir()` The `get_worktree_git_dir()` function returns a string constant that does not need to be free'd by the caller. This string is computed for three different cases: - If we don't have a worktree we return a path into the Git directory. The returned string is owned by `the_repository`, so there is no need for the caller to free it. - If we have a worktree, but no worktree ID then the caller requests the main worktree. In this case we return a path into the common directory, which again is owned by `the_repository` and thus does not need to be free'd. - In the third case, where we have an actual worktree, we compute the path relative to "$GIT_COMMON_DIR/worktrees/". This string does not need to be released either, even though `git_common_path()` ends up allocating memory. But this doesn't result in a memory leak either because we write into a buffer returned by `get_pathname()`, which returns one out of four static buffers. We're about to drop `git_common_path()` in favor of `repo_common_path()`, which doesn't use the same mechanism but instead returns an allocated string owned by the caller. While we could adapt `get_worktree_git_dir()` to also use `get_pathname()` and print the derived common path into that buffer, the whole schema feels a lot like premature optimization in this context. There are some callsites where we call `get_worktree_git_dir()` in a loop that iterates through all worktrees. But none of these loops seem to be even remotely in the hot path, so saving a single allocation there does not feel worth it. Refactor the function to instead consistently return an allocated path so that we can start using `repo_common_path()` in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-07 09:59:23 -08:00
Junio C Hamano	b83a2f9006	Merge branch 'kn/pack-write-with-reduced-globals' Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`	2025-02-03 10:23:34 -08:00
Patrick Steinhardt	0578f1e66a	global: adapt callers to use generic hash context helpers Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-31 10:06:11 -08:00
Patrick Steinhardt	7346e340f1	hash: stop typedeffing the hash context We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-31 10:06:10 -08:00
Karthik Nayak	e2f6f76585	pack-write: pass repository to `index_pack_lockfile()` The `index_pack_lockfile()` function uses the global `the_repository` variable to access the repository. To avoid global variable usage, pass the repository from the layers above. Altough the layers above could have access to the repository internally, simply pass in `the_repository`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-01-21 12:36:34 -08:00

1 2 3 4 5 ...

500 Commits