git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-05-05 17:40:24 -05:00

Author	SHA1	Message	Date
Johannes Schindelin	fc7f8f8fb0	mingw: use mimalloc Thorough benchmarking with repacking a subset of linux.git (the commit history reachable from 93a6fefe2f ([PATCH] fix the SYSCTL=n compilation, 2007-02-28), to be precise) suggest that this allocator is on par, in multi-threaded situations maybe even better than nedmalloc: `git repack -adfq` with mimalloc, 8 threads: 31.166991900 27.576763800 28.712311000 27.373859000 27.163141900 `git repack -adfq` with nedmalloc, 8 threads: 31.915032900 27.149883100 28.244933700 27.240188800 28.580849500 In a different test using GitHub Actions build agents (probably single-threaded, a core-strength of nedmalloc)): `git repack -q -d -l -A --unpack-unreachable=2.weeks.ago` with mimalloc: 943.426 978.500 939.709 959.811 954.605 `git repack -q -d -l -A --unpack-unreachable=2.weeks.ago` with nedmalloc: 995.383 952.179 943.253 963.043 980.468 While these measurements were not executed with complete scientific rigor, as no hardware was set aside specifically for these benchmarks, it shows that mimalloc and nedmalloc perform almost the same, nedmalloc with a bit higher variance and also slightly higher average (further testing suggests that nedmalloc performs worse in multi-threaded situations than in single-threaded ones). In short: mimalloc seems to be slightly better suited for our purposes than nedmalloc. Seeing that mimalloc is developed actively, while nedmalloc ceased to see any updates in eight years, let's use mimalloc on Windows instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:46 +01:00
Jeff Hostetler	3bce96c182	mimalloc: use "weak" random seed when statically linked Always use the internal "use_weak" random seed when initializing the "mimalloc" heap when statically linked on Windows. The imported "mimalloc" routines support several random sources to seed the heap data structures, including BCrypt.dll and RtlGenRandom. Crashes have been reported when using BCrypt.dll if it initialized during an `atexit()` handler function. Granted, such DLL initialization should not happen in an atexit handler, but yet the crashes remain. It should be noted that on Windows when statically linked, the mimalloc startup code (called by the GCC CRT to initialize static data prior to calling `main()`) always uses the internal "weak" random seed. "mimalloc" does not try to load an alternate random source until after the OS initialization has completed. Heap data is stored in `__declspec(thread)` TLS data and in theory each Git thread will have its own heap data. However, testing shows that the "mimalloc" library doesn't actually call `os_random_buf()` (to load a new random source) when creating these new per-thread heap structures. However, if an atexit handler is forced to run on a non-main thread, the "mimalloc" library WILL try to create a new heap and seed it with `os_random_buf()`. (The reason for this is still a mystery to this author.) The `os_random_buf()` call can cause the (previously uninitialized BCrypt.dll library) to be dynamically loaded and a call made into it. Crashes have been reported in v2.40.1.vfs.0.0 while in this call. As a workaround, the fix here forces the use of the internal "use_weak" random code for the subsequent `os_random_buf()` calls. Since we have been using that random generator for the majority of the program, it seems safe to use it for the final few mallocs in the atexit handler (of which there really shouldn't be that many. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:46 +01:00
Johannes Schindelin	27712a73ef	mimalloc: offer a build-time option to enable it By defining `USE_MIMALLOC`, Git can now be compiled with that nicely-fast and small allocator. Note that we have to disable a couple `DEVELOPER` options to build mimalloc's source code, as it makes heavy use of declarations after statements, among other things that disagree with Git's conventions. We even have to silence some GCC warnings in non-DEVELOPER mode. For example, the `-Wno-array-bounds` flag is needed because in `-O2` builds, trying to call `NtCurrentTeb()` (which `_mi_thread_id()` does on Windows) causes the bogus warning about a system header, likely related to https://sourceforge.net/p/mingw-w64/mailman/message/37674519/ and to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578: C:/git-sdk-64-minimal/mingw64/include/psdk_inc/intrin-impl.h:838:1: error: array subscript 0 is outside array bounds of 'long long unsigned int[0]' [-Werror=array-bounds] 838 \| __buildreadseg(__readgsqword, unsigned __int64, "gs", "q") \| ^~~~~~~~~~~~~~ Also: The `mimalloc` library uses C11-style atomics, therefore we must require that standard when compiling with GCC if we want to use `mimalloc` (instead of requiring "only" C99). This is what we do in the CMake definition already, therefore this commit does not need to touch `contrib/buildsystems/`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:46 +01:00
Johannes Schindelin	3952c00447	mimalloc: adjust for building inside Git We want to compile mimalloc's source code as part of Git, rather than requiring the code to be built as an external library: mimalloc uses a CMake-based build, which is not necessarily easy to integrate into the flavors of Git for Windows (which will be the main benefitting port). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:46 +01:00
Johannes Schindelin	eb1cd4d7d3	Import the source code of mimalloc v2.1.2 This commit imports mimalloc's source code as per v2.1.2, fetched from the tag at https://github.com/microsoft/mimalloc. The .c files are from the src/ subdirectory, and the .h files from the include/ and include/mimalloc/ subdirectories. We will subsequently modify the source code to accommodate building within Git's context. Since we plan on using the `mi_*()` family of functions, we skip the C++-specific source code, some POSIX compliant functions to interact with mimalloc, and the code that wants to support auto-magic overriding of the `malloc()` function (mimalloc-new-delete.h, alloc-posix.c, mimalloc-override.h, alloc-override.c, alloc-override-osx.c, alloc-override-win.c and static.c). To appease the `check-whitespace` job of Git's Continuous Integration, this commit was washed one time via `git rebase --whitespace=fix`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:45 +01:00
Johannes Schindelin	eea7e46579	git-compat-util: avoid redeclaring _DEFAULT_SOURCE We are about to vendor in `mimalloc`'s source code which we will want to include `git-compat-util.h` after defining that constant. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:45 +01:00
Johannes Schindelin	df7888f629	win32/pthread: avoid name clashes with winpthread The mingw-w64 GCC seems to link implicitly to libwinpthread, which does implement a pthread emulation (that is more complete than Git's). Let's keep preferring Git's. To avoid linker errors where it thinks that the `pthread_self` and the `pthread_create` symbols are defined twice, let's give our version a `win32_` prefix, just like we already do for `pthread_join()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:45 +01:00
Johannes Schindelin	8526281feb	mingw: include the Python parts in the build While Git for Windows does not _ship_ Python (in order to save on bandwidth), MSYS2 provides very fine Python interpreters that users can easily take advantage of, by using Git for Windows within its SDK. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:45 +01:00
Johannes Schindelin	c6ca5c1064	Merge branch 'fixes-from-the-git-mailing-list' These fixes have been sent to the Git mailing list but have not been picked up by the Git project yet. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:43 +01:00
Junio C Hamano	278028ef96	Merge branch 'mh/credential-cache-authtype-request-fix' The "cache" credential back-end did not handle authtype correctly, which has been corrected. * mh/credential-cache-authtype-request-fix: credential-cache: respect authtype capability Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:43 +01:00
Junio C Hamano	a225dbd554	Merge branch 'jc/show-index-h-update' Doc and short-help text for "show-index" has been clarified to stress that the command reads its data from the standard input. * jc/show-index-h-update: show-index: the short help should say the command reads from its input Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:42 +01:00
Junio C Hamano	803d56f5d4	Merge branch 'bf/fetch-set-head-fix' into jch Fetching into a bare repository incorrectly assumed it always used a mirror layout when deciding to update remote-tracking HEAD, which has been corrected. * bf/fetch-set-head-fix: fetch set_head: fix non-mirror remotes in bare repositories Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:42 +01:00
Junio C Hamano	e97324a463	Merge branch 'rs/ref-filter-used-atoms-value-fix' "git branch --sort=..." and "git for-each-ref --format=... --sort=..." did not work as expected with some atoms, which has been corrected. * rs/ref-fitler-used-atoms-value-fix: ref-filter: remove ref_format_clear() ref-filter: move is-base tip to used_atom ref-filter: move ahead-behind bases into used_atom Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:42 +01:00
Junio C Hamano	2dace82faa	Merge branch 'kn/reflog-migration-fix-followup' Code clean-up. * kn/reflog-migration-fix-followup: reftable: prevent 'update_index' changes after adding records refs: use 'uint64_t' for 'ref_update.index' refs: mark `ref_transaction_update_reflog()` as static These patches have been actually rebased onto a better base (the `kn/reflog-migration` tip instead of the merge commit that merged this tip). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:41 +01:00
Junio C Hamano	9607fee092	Merge branch 'kn/reflog-migration-fix' "git refs migrate" for migrating reflog data was broken. * kn/reflog-migration-fix: reftable: write correct max_update_index to header Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:41 +01:00
Junio C Hamano	02e4e7b521	Merge branch 'en/object-name-with-funny-refname-fix' Extended SHA-1 expression parser did not work well when a branch with an unusual name (e.g. "foo{bar") is involved. * en/object-name-with-funny-refname-fix: object-name: be more strict in parsing describe-like output object-name: fix resolution of object names containing curly braces Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:41 +01:00
Junio C Hamano	24f5d72492	Merge branch 'jk/pack-header-parse-alignment-fix' It was possible for "git unpack-objects" and "git index-pack" to make an unaligned access, which has been corrected. * jk/pack-header-parse-alignment-fix: index-pack, unpack-objects: use skip_prefix to avoid magic number index-pack, unpack-objects: use get_be32() for reading pack header parse_pack_header_option(): avoid unaligned memory writes packfile: factor out --pack_header argument parsing bswap.h: squelch potential sparse -Wcast-truncate warnings These patches have actually been rebased onto v2.46.2 for easier merging. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:41 +01:00
Jeff King	51354d1af5	update-ref: do set reflog's `old_oid` In git 2.48.1, the `git update-ref` subcommand no longer correctly updates the reflog in some cases. Specifically, it appears that the `old_oid` field will not be updated when modifying a branch referenced by another symbolic ref (e.g. HEAD). This doesn't break the `git reflog` subcommand, but does break references like `HEAD@{1}`, which appear to read the `old_oid` field: git init -b main git commit --allow-empty -m "A" git commit --allow-empty -m "B" git update-ref -m "reason" refs/heads/main HEAD~ HEAD The `old_oid` field is now empty (all zeroes). This is only the case in derived reflogs (in this case .git/logs/HEAD). The reflog for `refs/heads/main` appears to be updated correctly. This was broken in `297c09eabb` (refs: allow multiple reflog entries for the same refname, 2024-12-16). The reason for that was that there was assumed the flow of `lock_ref_for_update()` for reflog only updates was to capture the lock only. But this is wrong since this misses the `old_oid` population. As such this patch is the correct fix. Reported-by: Nika Layzell <nika@thelayzells.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:40 +01:00
Junio C Hamano	9b7b978476	Merge branch 'ps/object-collision-check' CI jobs gave sporadic failures, which turns out that that the object finalization code was giving an error when it did not have to. * ps/object-collision-check: object-file: retry linking file into place when occluding file vanishes object-file: don't special-case missing source file in collision check object-file: rename variables in `check_collision()` object-file: fix race in object collision check Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:40 +01:00
Jeff King	a2e7060a1f	grep: prevent `^$` false match at end of file In some implementations, `regexec_buf()` assumes that it is fed lines; Without `REG_NOTEOL` it thinks the end of the buffer is the end of a line. Which makes sense, but trips up this case because we are not feeding lines, but rather a whole buffer. So the final newline is not the start of an empty line, but the true end of the buffer. This causes an interesting bug: $ echo content >file.txt $ git grep --no-index -n '^$' file.txt file.txt:2: This bug is fixed by making the end of the buffer consistently the end of the final line. The patch was applied from https://lore.kernel.org/git/20250113062601.GD767856@coredump.intra.peff.net/ Reported-by: Olly Betts <olly@survex.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:40 +01:00
Junio C Hamano	cbb262adf2	Merge branch 'jk/lsan-race-ignore-false-positive' The code to check LSan results has been simplified and made more robust. * jk/lsan-race-ignore-false-positive: test-lib: add a few comments to LSan log checking test-lib: simplify lsan results check test-lib: invert return value of check_test_results_san_file_empty Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:40 +01:00
Adam Murray	299435c3e5	trace2: prevent segfault on config collection where no value specified When TRACE2 analytics is enabled, a git config option that has no value causes a segfault. Steps to Reproduce GIT_TRACE2=true GIT_TRACE2_CONFIG_PARAMS=status.* git -c status.relativePaths version Expected Result git version 2.46.0 Actual Result zsh: segmentation fault GIT_TRACE2=true This adds checks to prevent the segfault and instead return an empty value. Signed-off-by: Adam Murray <ad@canva.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:40 +01:00
M Hickford	f40199fe18	credential-cache: respect authtype capability Previously, credential-cache populated authtype regardless whether "get" request had authtype capability. As documented in git-credential.txt, authtype "should not be sent unless the appropriate capability ... is provided". Add test. Without this change, the test failed because "credential fill" printed an incomplete credential with only protocol and host attributes (the unexpected authtype attribute was discarded by credential.c). Signed-off-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:39 +01:00
Junio C Hamano	3e20693c1d	show-index: the short help should say the command reads from its input The short help text given by "git show-index -h" says $ git show-index -h usage: git show-index [--object-format=<hash-algorithm>] --[no-]object-format <hash-algorithm> specify the hash algorithm to use The command takes a pack .idx file from its standard input. The user has to _know_ this, as there is no indication from this output. Give a hint that the data to work on is fed from its standard input. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:39 +01:00
Bence Ferdinandy	c4c6f21a2e	fetch set_head: fix non-mirror remotes in bare repositories In `b1b713f722` (fetch set_head: handle mirrored bare repositories, 2024-11-22) it was implicitly assumed that all remotes will be mirrors in a bare repository, thus fetching a non-mirrored remote could lead to HEAD pointing to a non-existent reference. Make sure we only overwrite HEAD if we are in a bare repository and fetching from a mirror. Otherwise, proceed as normally, and create refs/remotes/<nonmirrorremote>/HEAD instead. Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com> Reported-by: Christian Hesse <list@eworm.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:39 +01:00
René Scharfe	135a432f4c	ref-filter: remove ref_format_clear() Now that ref_format_clear() no longer releases any memory we don't need it anymore. Remove it and its counterpart, ref_format_init(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:39 +01:00
René Scharfe	2fbabb941b	ref-filter: move is-base tip to used_atom The string_list "is_base_tips" in struct ref_format stores the committish part of "is-base:<committish>". It has the same problems that its sibling string_list "bases" had. Fix them the same way as the previous commit did for the latter, by replacing the string_list with fields in "used_atom". Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:39 +01:00
René Scharfe	8849ac4117	ref-filter: move ahead-behind bases into used_atom verify_ref_format() parses a ref-filter format string and stores recognized items in the static array "used_atom". For "ahead-behind:<committish>" it stores the committish part in a string_list member "bases" of struct ref_format. ref_sorting_options() also parses bare ref-filter format items and stores stores recognized ones in "used_atom" as well. The committish parts go to a dummy struct ref_format in parse_sorting_atom(), though, and are leaked and forgotten. If verify_ref_format() is called before ref_sorting_options(), like in git for-each-ref, then all works well if the sort key is included in the format string. If it isn't then sorting cannot work as the committishes are missing. If ref_sorting_options() is called first, like in git branch, then we have the additional issue that if the sort key is included in the format string then filter_ahead_behind() can't see its committish, will not generate any results for it and thus it will be expanded to an empty string. Fix those issues by replacing the string_list with a field in used_atom for storing the committish. This way it can be shared for handling both ref-filter format strings and sorting options in the same command. Reported-by: Ross Goldberg <ross.goldberg@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:39 +01:00
Karthik Nayak	2b0861883e	reftable: prevent 'update_index' changes after adding records The function `reftable_writer_set_limits()` allows updating the 'min_update_index' and 'max_update_index' of a reftable writer. These values are written to both the writer's header and footer. Since the header is written during the first block write, any subsequent changes to the update index would create a mismatch between the header and footer values. The footer would contain the newer values while the header retained the original ones. To fix this bug, prevent callers from updating these values after any record is written. To do this, modify the function to return an error whenever the limits are modified after any record adds. Check for record adds within `reftable_writer_set_limits()` by checking the `last_key` variable, which is set whenever a new record is added. Modify all callers of the function to anticipate a return type and handle it accordingly. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:38 +01:00
Karthik Nayak	8ba42f1987	reftable: write correct max_update_index to header In `297c09eabb` (refs: allow multiple reflog entries for the same refname, 2024-12-16), the reftable backend learned to handle multiple reflog entries within the same transaction. This was done modifying the `update_index` for reflogs with multiple indices. During writing the logs, the `max_update_index` of the writer was modified to ensure the limits were raised to the modified `update_index`s. However, since ref entries are written before the modification to the `max_update_index`, if there are multiple blocks to be written, the reftable backend writes the header with the old `max_update_index`. When all logs are finally written, the footer will be written with the new `min_update_index`. This causes a mismatch between the header and the footer and causes the reftable file to be corrupted. The existing tests only spawn a single block and since headers are lazily written with the first block, the tests didn't capture this bug. To fix the issue, the appropriate `max_update_index` limit must be set even before the first block is written. Add a `max_index` field to the transaction which holds the `max_index` within all its updates, then propagate this value to the reftable backend, wherein this is used to the set the `max_update_index` correctly. Add a test which creates a few thousand reference updates with multiple reflog entries, which should trigger the bug. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:38 +01:00
Elijah Newren	5aeb1e471a	object-name: be more strict in parsing describe-like output From Documentation/revisions.txt: '<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb':: Output from `git describe`; i.e. a closest tag, optionally followed by a dash and a number of commits, followed by a dash, a 'g', and an abbreviated object name. which means that output of the format ${REFNAME}-${INTEGER}-g${HASH} should parse to fully expanded ${HASH}. This is fine. However, we currently don't validate any of ${REFNAME}-${INTEGER}, we only parse -g${HASH} and assume the rest is valid. That is problematic, since it breaks things like git cat-file -p branchname:path/to/file/named/i-gaffed which, when commit (or tree or blob) affed exists, will not return us information about the file we are looking for but will instead erroneously tell us about object affed. A few additional notes: - This is a slight backward incompatibility break, because we used to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}. However, a backward incompatible break is necessary, because there is no other way for someone to be more specific and disambiguate that they want the blob master:path/to/who-gabbed instead of the object abbed. - There is a possibility that check_refname_format() rules change in the future. However, we can only realistically loosen the rules for what that function accepts rather than tighten. If we were to tighten the rules, some real world repositories may already have refnames that suddenly become unacceptable and we break those repositories. As such, any describe-like syntax of the form ${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the changes in this commit will remain valid in the future. - The fact that check_refname_format() rules could loosen in the future is probably also an important reason to make this change. If the rules loosen, there might be additional cases within ${GARBAGE}-g${HASH} that become ambiguous in the future. While abbreviated hashes can be disambiguated by abbreviating less, it may well be that these alternative object names have no way of being disambiguated (much like pathnames cannot be). Accepting all random ${GARBAGE} thus makes it difficult for us to allow future extensions to object naming. So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are present in the string, and would be considered a valid ref and non-negative integer. Also, add a few tests for git describe using object names of the form ${REVISION_NAME}${MODIFIERS} since an early version of this patch failed on constructs like git describe v2.48.0-rc2-161-g6c2274cdbc^0 Reported-by: Gabriel Amaral <gabriel-amaral@github.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:38 +01:00
Karthik Nayak	80d74ed7c2	refs: use 'uint64_t' for 'ref_update.index' The 'ref_update.index' variable is used to store an index for a given reference update. This index is used to order the updates in a predetermined order, while the default ordering is alphabetical as per the refname. For large repositories with millions of references, it should be safer to use 'uint64_t'. Let's do that. This also is applied for all other code sections where we store 'index' and pass it around. Reported-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:38 +01:00
Elijah Newren	5a88867729	object-name: fix resolution of object names containing curly braces Given a branch name of 'foo{bar', commands like git cat-file -p foo{bar:README.md should succeed (assuming that branch had a README.md file, of course). However, the change in `cce91a2cae` (Change 'master@noon' syntax to 'master@{noon}'., 2006-05-19) presumed that curly braces would always come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md' to entirely miss the ':' and assume there's no object being referenced. In short, git would report: fatal: Not a valid object name foo{bar:README.md Change the parsing to only make the assumption of paired curly braces immediately after either a '@' or '^' character appears. Add tests for this, as well as for a few other test cases that initial versions of this patch broke: * 'foo@@{...}' * 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}' Note that we'd prefer not duplicating the special logic for "@^" characters here, because if get_oid_basic() or interpret_nth_prior_checkout() or get_oid_basic() or similar gain extra methods of using curly braces, then the logic in get_oid_with_context_1() would need to be updated as well. But it's not clear how to refactor all of these to have a simple common callpoint with the specialized logic. Reported-by: Gabriel Amaral <gabriel-amaral@github.com> Helped-by: Michael Haggerty <mhagger@github.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:38 +01:00
Karthik Nayak	62a5707f7c	refs: mark `ref_transaction_update_reflog()` as static The `ref_transaction_update_reflog()` function is only used within 'refs.c', so mark it as static. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:38 +01:00
Jeff King	dae2c5e5c7	index-pack, unpack-objects: use skip_prefix to avoid magic number When parsing --pack_header=, we manually skip 14 bytes to the data. Let's use skip_prefix() to do this automatically. Note that we overwrite our pointer to the front of the string, so we have to add more context to the error message. We could avoid this by declaring an extra pointer to hold the value, but I think the modified message is actually preferable; it should give translators a bit more context. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:37 +01:00
Patrick Steinhardt	a6aca325db	object-file: retry linking file into place when occluding file vanishes Prior to `0ad3d65652` (object-file: fix race in object collision check, 2024-12-30), callers could expect that a successful return from `finalize_object_file()` means that either the file was moved into place, or the identical bytes were already present. If neither of those happens, we'd return an error. Since that commit, if the destination file disappears between our link(3p) call and the collision check, we'd return success without actually checking the contents, and without retrying the link. This solves the common case that the files were indeed the same, but it means that we may corrupt the repository if they weren't (this implies a hash collision, but the whole point of this function is protecting against hash collisions). We can't be pessimistic and assume they're different; that hurts the common case that the mentioned commit was trying to fix. But after seeing that the destination file went away, we can retry linking again. Adapt the code to do so when we see that the destination file has racily vanished. This should generally succeed as we have just observed that the destination file does not exist anymore, except in the very unlikely event that it gets recreated by another concurrent process again. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:37 +01:00
Jeff King	16be54a929	index-pack, unpack-objects: use get_be32() for reading pack header Both of these commands read the incoming pack into a static unsigned char buffer in BSS, and then parse it by casting the start of the buffer to a struct pack_header. This can result in SIGBUS on some platforms if the compiler doesn't place the buffer in a position that is properly aligned for 4-byte integers. This reportedly happens with unpack-objects (but not index-pack) on sparc64 when compiled with clang (but not gcc). But we are definitely in the wrong in both spots; since the buffer's type is unsigned char, we can't depend on larger alignment. When it works it is only because we are lucky. We'll fix this by switching to get_be32() to read the headers (just like the last few commits similarly switched us to put_be32() for writing into the same buffer). It would be nice to factor this out into a common helper function, but the interface ends up quite awkward. Either the caller needs to hardcode how many bytes we'll need, or it needs to pass us its fill()/use() functions as pointers. So I've just fixed both spots in the same way; this is not code that is likely to be repeated a third time (most of the pack reading code uses an mmap'd buffer, which should be properly aligned). I did make one tweak to the shared code: our pack_version_ok() macro expects us to pass the big-endian value we'd get by casting. We can introduce a "native" variant which uses the host integer ordering. Reported-by: Koakuma <koachan@protonmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:37 +01:00
Patrick Steinhardt	cec3ebbdd8	object-file: don't special-case missing source file in collision check In `0ad3d65652` (object-file: fix race in object collision check, 2024-12-30) we have started to ignore ENOENT when opening either the source or destination file of the collision check. This was done to handle races more gracefully in case either of the potentially-colliding disappears. The fix is overly broad though: while the destination file may indeed vanish racily, this shouldn't ever happen for the source file, which is a temporary object file (either loose or in packfile format) that we have just created. So if any concurrent process would have removed that temporary file it would indicate an actual issue. Stop treating ENOENT specially for the source file so that we always bubble up this error. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:37 +01:00
Jeff King	5285cdc7f9	parse_pack_header_option(): avoid unaligned memory writes In order to recreate a pack header in our in-memory buffer, we cast the buffer to a "struct pack_header" and assign the individual fields. This is reported to cause SIGBUS on sparc64 due to alignment issues. We can work around this by using put_be32() which will write individual bytes into the buffer. Reported-by: Koakuma <koachan@protonmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:37 +01:00
Patrick Steinhardt	5e1d9aa05e	object-file: rename variables in `check_collision()` Rename variables used in `check_collision()` to clearly identify which file is the source and which is the destination. This will make the next step easier to reason about when we start to treat those files different from one another. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:37 +01:00
Jeff King	2dbdf2c476	packfile: factor out --pack_header argument parsing Both index-pack and unpack-objects accept a --pack_header argument. This is an undocumented internal argument used by receive-pack and fetch to pass along information about the header of the pack, which they've already read from the incoming stream. In preparation for a bugfix, let's factor the duplicated code into a common helper. The callers are still responsible for identifying the option. While this could likewise be factored out, it is more flexible this way (e.g., if they ever started using parse-options and wanted to handle both the stuck and unstuck forms). Likewise, the callers are responsible for reporting errors, though they both just call die(). I've tweaked unpack-objects to match index-pack in marking the error for translation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:37 +01:00
Patrick Steinhardt	6abb17c867	object-file: fix race in object collision check One of the tests in t5616 asserts that git-fetch(1) with `--refetch` triggers repository maintenance with the correct set of arguments. This test is flaky and causes us to fail sometimes: ++ git -c protocol.version=0 -c gc.autoPackLimit=0 -c maintenance.incremental-repack.auto=1234 -C pc1 fetch --refetch origin error: unable to open .git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack: No such file or directory fatal: unable to rename temporary file to '.git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack' fatal: could not finish pack-objects to repack local links fatal: index-pack failed error: last command exited with $?=128 The error message is quite confusing as it talks about trying to rename a temporary packfile. A first hunch would thus be that this packfile gets written by git-fetch(1), but removed by git-maintenance(1) while it hasn't yet been finalized, which shouldn't ever happen. And indeed, when looking closer one notices that the file that is supposedly of temporary nature does not have the typical `tmp_pack_` prefix. As it turns out, the "unable to rename temporary file" fatal error is a red herring and the real error is "unable to open". That error is raised by `check_collision()`, which is called by `finalize_object_file()` when moving the new packfile into place. Because t5616 re-fetches objects, we end up with the exact same pack as we already have in the repository. So when the concurrent git-maintenance(1) process rewrites the preexisting pack and unlinks it exactly at the point in time where git-fetch(1) wants to check the old and new packfiles for equality we will see ENOENT and thus `check_collision()` returns an error, which gets bubbled up by `finalize_object_file()` and is then handled by `rename_tmp_packfile()`. That function does not know about the exact root cause of the error and instead just claims that the rename has failed. This race is thus caused by `b1b8dfde69` (finalize_object_file(): implement collision check, 2024-09-26), where we have newly introduced the collision check. By definition, two files cannot collide with each other when one of them has been removed. We can thus trivially fix the issue by ignoring ENOENT when opening either of the files we're about to check for collision. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:37 +01:00
Junio C Hamano	ab2a205d1c	bswap.h: squelch potential sparse -Wcast-truncate warnings In put_be32(), we right-shift a uint32_t value various amounts and then assign the low 8-bits to individual "unsigned char" bytes, throwing away the high bits. For shifts smaller than 24 bits, those thrown away bits will be arbitrary bits from the original uint32_t. This works exactly as we want, but if you feed a constant, then sparse complains. For example if we write this (which we plan to do in a future patch): put_be32(hdr, PACK_SIGNATURE); then "make sparse" produces: compat/bswap.h:175:22: error: cast truncates bits from constant value (5041 becomes 41) compat/bswap.h:176:22: error: cast truncates bits from constant value (504143 becomes 43) compat/bswap.h:177:22: error: cast truncates bits from constant value (5041434b becomes 4b) And the same issue exists in the other put_be*() functions, when used with a constant. We can silence this warning by explicitly masking off the truncated bits. The compiler is smart enough to know the result is the same, and the asm generated by gcc (with both -O0 and -O2) is identical. Curiously this line already exists: put_be32(&hdr_version, INDEX_EXTENSION_VERSION2); in the fsmonitor.c file, but it does not get flagged because the CPP macro expands to a small integer (2). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:37 +01:00
Johannes Schindelin	5cc2d6c16f	Merge branch 'disallow-control-characters-in-sideband-channel' This addresses: - CVE-2024-52005: Insufficient neutralization of ANSI escape sequences in sideband payload can be used to mislead Git users into believing that certain remote-generated messages actually originate from Git. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>	2025-02-06 19:32:36 +01:00
Jeff King	ec65acf4c5	test-lib: add a few comments to LSan log checking Commit `b119a687d4` (test-lib: ignore leaks in the sanitizer's thread code, 2025-01-01) added code to suppress a false positive in the leak checker. But if you're just reading the code, the obscure grep call is a bit of a head-scratcher. Let's add a brief comment explaining what's going on (and anybody digging further can find this commit or that one for all the details). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:36 +01:00
Johannes Schindelin	63cab6aa67	unix-socket: avoid leak when initialization fails When a Unix socket is initialized, the current directory's path is stored so that the cleanup code can `chdir()` back to where it was before exit. If the path that needs to be stored exceeds the default size of the `sun_path` attribute of `struct sockaddr_un` (which is defined as a 108-sized byte array on Linux), a larger buffer needs to be allocated so that it can hold the path, and it is the responsibility of the `unix_sockaddr_cleanup()` function to release that allocated memory. In Git's CI, this stack allocation is not necessary because the code is checked out to `/home/runner/work/git/git`. Concatenate the path `t/trash directory.t0301-credential-cache/.cache/git/credential/socket` and a terminating NUL, and you end up with 96 bytes, 12 shy of the default `sun_path` size. However, I use worktrees with slightly longer paths: `/home/me/projects/git/yes/i/nest/worktrees/to/organize/them/` is more in line with what I have. When I recently tried to locally reproduce a failure of the `linux-leaks` CI job, this t0301 test failed (where it had not failed in CI). The reason: When `credential-cache` tries to reach its daemon initially by calling `unix_sockaddr_init()`, it is expected that the daemon cannot be reached (the idea is to spin up the daemon in that case and try again). However, when this first call to `unix_sockaddr_init()` fails, the code returns early from the `unix_stream_connect()` function _without_ giving the cleanup code a chance to run, skipping the deallocation of above-mentioned path. The fix is easy: do not return early but instead go directly to the cleanup code. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:36 +01:00
Johannes Schindelin	65896a3814	sideband: do allow ANSI color sequences by default The preceding two commits introduced special handling of the sideband channel to neutralize ANSI escape sequences before sending the payload to the terminal, and `sideband.allowControlCharacters` to override that behavior. However, some `pre-receive` hooks that are actively used in practice want to color their messages and therefore rely on the fact that Git passes them through to the terminal. In contrast to other ANSI escape sequences, it is highly unlikely that coloring sequences can be essential tools in attack vectors that mislead Git users e.g. by hiding crucial information. Therefore we can have both: Continue to allow ANSI coloring sequences to be passed to the terminal, and neutralize all other ANSI escape sequences. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:36 +01:00
Jeff King	75dde9545a	test-lib: simplify lsan results check We want to know if there are any leaks logged by LSan in the results directory, so we run "find" on the containing directory and pipe it to xargs. We can accomplish the same thing by just globbing in the shell and passing the result to grep, which has a few advantages: - it's one fewer process to run - we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we checked at the beginning of the function, and is the same glob used to show the logs in check_test_results_san_file_ - this correctly handles the case where TEST_OUTPUT_DIRECTORY has a space in it. For example doing: mkdir "/tmp/foo bar" TEST_OUTPUT_DIRECTORY="/tmp/foo bar" make SANITIZE=leak test would yield a lot of: grep: /tmp/foo: No such file or directory grep: bar/test-results/t0006-date.leak/trace.test-tool.582311: No such file or directory when there are leaks. We could do the same thing with "xargs --null", but that isn't portable. We are now subject to command-line length limits, but that is also true of the globbing cat used to show the logs themselves. This hasn't been a problem in practice. We do need to use "grep -s" for the case that the glob does not expand (i.e., there are not any log files at all). This option is in POSIX, and has been used in t7407 for several years without anybody complaining. This also also naturally handles the case where the surrounding directory has already been removed (in which case there are likewise no files!), dropping the need to comment about it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:36 +01:00
Johannes Schindelin	976299bdeb	sideband: introduce an "escape hatch" to allow control characters The preceding commit fixed the vulnerability whereas sideband messages (that are under the control of the remote server) could contain ANSI escape sequences that would be sent to the terminal verbatim. However, this fix may not be desirable under all circumstances, e.g. when remote servers deliberately add coloring to their messages to increase their urgency. To help with those use cases, give users a way to opt-out of the protections: `sideband.allowControlCharacters`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-02-06 19:32:36 +01:00
Jeff King	495ab709fd	test-lib: invert return value of check_test_results_san_file_empty We have a function to check whether LSan logged any leaks. It returns success for no leaks, and non-zero otherwise. This is the simplest thing for its callers, who want to say "if no leaks then return early". But because it's implemented as a shell pipeline, you end up with the awkward: ! find ... \| xargs grep leaks \| grep -v false-positives where the "!" is actually negating the final grep. Switch the return value (and name) to return success when there are leaks. This should make the code a little easier to read, and the negation in the callers still reads pretty naturally. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-02-06 19:32:36 +01:00

1 2 3 4 5 ...

163443 Commits