git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-02-04 03:33:01 -06:00

Author	SHA1	Message	Date
Johannes Schindelin	5d06cc119c	Merge branch 'phase-out-reset-stdin' This topic branch re-adds the deprecated --stdin/-z options to `git reset`. Those patches were overridden by a different set of options in the upstream Git project before we could propose `--stdin`. We offered this in MinGit to applications that wanted a safer way to pass lots of pathspecs to Git, and these applications will need to be adjusted. Instead of `--stdin`, `--pathspec-from-file=-` should be used, and instead of `-z`, `--pathspec-file-nul`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:53:50 +02:00
Johannes Schindelin	60d090ce9b	reset: reinstate support for the deprecated --stdin option The `--stdin` option was a well-established paradigm in other commands, therefore we implemented it in `git reset` for use by Visual Studio. Unfortunately, upstream Git decided that it is time to introduce `--pathspec-from-file` instead. To keep backwards-compatibility for some grace period, we therefore reinstate the `--stdin` option on top of the `--pathspec-from-file` option, but mark it firmly as deprecated. Helped-by: Victoria Dye <vdye@github.com> Helped-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:52:02 +02:00
Johannes Schindelin	2002ef63ce	Introduce helper to create symlinks that knows about index_state On Windows, symbolic links actually have a type depending on the target: it can be a file or a directory. In certain circumstances, this poses problems, e.g. when a symbolic link is supposed to point into a submodule that is not checked out, so there is no way for Git to auto-detect the type. To help with that, we will add support over the course of the next commits to specify that symlink type via the Git attributes. This requires an index_state, though, something that Git for Windows' `symlink()` replacement cannot know about because the function signature is defined by the POSIX standard and not ours to change. So let's introduce a helper function to create symbolic links that does know about the index_state. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:59 +02:00
Ben Boeckel	7a84517d1e	clean: suggest using `core.longPaths` if paths are too long to remove On Windows, git repositories may have extra files which need cleaned (e.g., a build directory) that may be arbitrarily deep. Suggest using `core.longPaths` if such situations are encountered. Fixes: #2715 Signed-off-by: Ben Boeckel <mathstuf@gmail.com>	2025-06-16 20:51:56 +02:00
Johannes Schindelin	28a43e4501	clean: make use of FSCache The `git clean` command needs to enumerate plenty of files and directories, and can therefore benefit from the FSCache. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:55 +02:00
Ben Peart	9ddcc39855	fscache: fscache takes an initial size Update enable_fscache() to take an optional initial size parameter which is used to initialize the hashmap so that it can avoid having to rehash as additional entries are added. Add a separate disable_fscache() macro to make the code clearer and easier to read. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:54 +02:00
Ben Peart	77a9a3cf58	status: disable and free fscache at the end of the status command At the end of the status command, disable and free the fscache so that we don't leak the memory and so that we can dump the fscache statistics. Signed-off-by: Ben Peart <benpeart@microsoft.com>	2025-06-16 20:51:54 +02:00
Takuto Ikuta	9d9a0f6ea1	checkout.c: enable fscache for checkout again This is retry of #1419. I added flush_fscache macro to flush cached stats after disk writing with tests for regression reported in #1438 and #1442. git checkout checks each file path in sorted order, so cache flushing does not make performance worse unless we have large number of modified files in a directory containing many files. Using chromium repository, I tested `git checkout .` performance when I delete 10 files in different directories. With this patch: TotalSeconds: 4.307272 TotalSeconds: 4.4863595 TotalSeconds: 4.2975562 Avg: 4.36372923333333 Without this patch: TotalSeconds: 20.9705431 TotalSeconds: 22.4867685 TotalSeconds: 18.8968292 Avg: 20.7847136 I confirmed this patch passed all tests in t/ with core_fscache=1. Signed-off-by: Takuto Ikuta <tikuta@chromium.org>	2025-06-16 20:51:54 +02:00
Jeff Hostetler	de161d8863	add: use preload-index and fscache for performance Teach "add" to use preload-index and fscache features to improve performance on very large repositories. During an "add", a call is made to run_diff_files() which calls check_remove() for each index-entry. This calls lstat(). On Windows, the fscache code intercepts the lstat() calls and builds a private cache using the FindFirst/FindNext routines, which are much faster. Somewhat independent of this, is the preload-index code which distributes some of the start-up costs across multiple threads. We need to keep the call to read_cache() before parsing the pathspecs (and hence cannot use the pathspecs to limit any preload) because parse_pathspec() is using the index to determine whether a pathspec is, in fact, in a submodule. If we would not read the index first, parse_pathspec() would not error out on a path that is inside a submodule, and t7400-submodule-basic.sh would fail with not ok 47 - do not add files from a submodule We still want the nice preload performance boost, though, so we simply call read_cache_preload(&pathspecs) after parsing the pathspecs. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:53 +02:00
Karsten Blees	94306ce531	mingw: add infrastructure for read-only file system level caches Add a macro to mark code sections that only read from the file system, along with a config option and documentation. This facilitates implementation of relatively simple file system level caches without the need to synchronize with the file system. Enable read-only sections for 'git status' and preload_index. Signed-off-by: Karsten Blees <blees@dcon.de>	2025-06-16 20:51:53 +02:00
Johannes Schindelin	dfcfeb4356	credential-cache: handle ECONNREFUSED gracefully (#5329 ) I should probably add some tests for this.	2025-06-16 20:51:49 +02:00
Johannes Schindelin	baa3a9a750	Add experimental 'git survey' builtin (#5174 ) This introduces `git survey` to Git for Windows ahead of upstream for the express purpose of getting the path-based analysis in the hands of more folks. The inspiration of this builtin is [`git-sizer`](https://github.com/github/git-sizer), but since that command relies on `git cat-file --batch` to get the contents of objects, it has limits to how much information it can provide. This is mostly a rewrite of the `git survey` builtin that was introduced into the `microsoft/git` fork in microsoft/git#667. That version had a lot more bells and whistles, including an analysis much closer to what `git-sizer` provides. The biggest difference in this version is that this one is focused on using the path-walk API in order to visit batches of objects based on a common path. This allows identifying, for instance, the path that is contributing the most to the on-disk size across all versions at that path. For example, here are the top ten paths contributing to my local Git repository (which includes `microsoft/git` and `gitster/git`): ``` TOP FILES BY DISK SIZE ============================================================================ Path \| Count \| Disk Size \| Inflated Size -----------------------------------------+-------+-----------+-------------- whats-cooking.txt \| 1373 \| 11637459 \| 37226854 t/helper/test-gvfs-protocol \| 2 \| 6847105 \| 17233072 git-rebase--helper \| 1 \| 6027849 \| 15269664 compat/mingw.c \| 6111 \| 5194453 \| 463466970 t/helper/test-parse-options \| 1 \| 3420385 \| 8807968 t/helper/test-pkt-line \| 1 \| 3408661 \| 8778960 t/helper/test-dump-untracked-cache \| 1 \| 3408645 \| 8780816 t/helper/test-dump-fsmonitor \| 1 \| 3406639 \| 8776656 po/vi.po \| 104 \| 1376337 \| 51441603 po/de.po \| 210 \| 1360112 \| 71198603 ``` This kind of analysis has been helpful in identifying the reasons for growth in a few internal monorepos. Those findings motivated the changes in #5157 and #5171. With this early version in Git for Windows, we can expand the reach of the experimental tool in advance of it being contributed to the upstream project. Unfortunately, this will mean that in the next `microsoft/git` rebase, Jeff Hostetler's version will need to be pulled out since there are enough conflicts. These conflicts include how tables are stored and generated, as the version in this PR is slightly more general to allow for different kinds of data. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:49 +02:00
Derrick Stolee	7936332d28	Add path walk API and its use in 'git pack-objects' (#5171 ) This is a follow up to #5157 as well as motivated by the RFC in gitgitgadget/git#1786. We have ways of walking all objects, but it is focused on visiting a single commit and then expanding the new trees and blobs reachable from that commit that have not been visited yet. This means that objects arrive without any locality based on their path. Add a new "path walk API" that focuses on walking objects in batches according to their type and path. This will walk all annotated tags, all commits, all root trees, and then start a depth-first search among all paths in the repo to collect trees and blobs in batches. The most important application for this is being fast-tracked to Git for Windows: `git pack-objects --path-walk`. This application of the path walk API discovers the objects to pack via this batched walk, and automatically groups objects that appear at a common path so they can be checked for delta comparisons. This use completely avoids any name-hash collisions (even the collisions that sometimes occur with the new `--full-name-hash` option) and can be much faster to compute since the first pass of delta calculations does not waste time on objects that are unlikely to be diffable. Some statistics are available in the commit messages.	2025-06-16 20:51:48 +02:00
Matthias Aßhauer	f727a5cdd2	credential-cache: handle ECONNREFUSED gracefully In 245670c (credential-cache: check for windows specific errors, 2021-09-14) we concluded that on Windows we would always encounter ENETDOWN where we would expect ECONNREFUSED on POSIX systems, when connecting to unix sockets. As reported in [1], we do encounter ECONNREFUSED on Windows if the socket file doesn't exist, but the containing directory does and ENETDOWN if neither exists. We should handle this case like we do on non-windows systems. [1] https://github.com/git-for-windows/git/pull/4762#issuecomment-2545498245 This fixes https://github.com/git-for-windows/git/issues/5314 Helped-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:38 +02:00
Johannes Schindelin	d07bab6c5b	survey: clearly note the experimental nature in the output While this command is definitely something we _want_, chances are that upstreaming this will require substantial changes. We still want to be able to experiment with this before that, to focus on what we need out of this command: To assist with diagnosing issues with large repositories, as well as to help monitoring the growth and the associated painpoints of such repositories. To that end, we are about to integrate this command into `microsoft/git`, to get the tool into the hands of users who need it most, with the idea to iterate in close collaboration between these users and the developers familar with Git's internals. However, we will definitely want to avoid letting anybody have the impression that this command, its exact inner workings, as well as its output format, are anywhere close to stable. To make that fact utterly clear (and thereby protect the freedom to iterate and innovate freely before upstreaming the command), let's mark its output as experimental in all-caps, as the first thing we do. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:38 +02:00
Derrick Stolee	1ffeabcbb9	survey: add --top=<N> option and config The 'git survey' builtin provides several detail tables, such as "top files by on-disk size". The size of these tables defaults to 10, currently. Allow the user to specify this number via a new --top=<N> option or the new survey.top config key. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:38 +02:00
Derrick Stolee	1667974c81	survey: add report of "largest" paths Since we are already walking our reachable objects using the path-walk API, let's now collect lists of the paths that contribute most to different metrics. Specifically, we care about * Number of versions. * Total size on disk. * Total inflated size (no delta or zlib compression). This information can be critical to discovering which parts of the repository are causing the most growth, especially on-disk size. Different packing strategies might help compress data more efficiently, but the toal inflated size is a representation of the raw size of all snapshots of those paths. Even when stored efficiently on disk, that size represents how much information must be processed to complete a command such as 'git blame'. The exact disk size seems to be not quite robust enough for testing, as could be seen by the `linux-musl-meson` job consistently failing, possibly because of zlib-ng deflates differently: t8100.4(git survey (default)) was failing with a symptom like this: TOTAL OBJECT SIZES BY TYPE =============================================== Object Type \| Count \| Disk Size \| Inflated Size ------------+-------+-----------+-------------- - Commits \| 10 \| 1523 \| 2153 + Commits \| 10 \| 1528 \| 2153 Trees \| 10 \| 495 \| 1706 Blobs \| 10 \| 191 \| 101 - Tags \| 4 \| 510 \| 528 + Tags \| 4 \| 547 \| 528 This means: the disk size is unlikely something we can verify robustly. Since zlib-ng seems to increase the disk size of the tags from 528 to 547, we cannot even assume that the disk size is always smaller than the inflated size. We will most likely want to either skip verifying the disk size altogether, or go for some kind of fuzzy matching, say, by replacing `s/ 1[45][0-9][0-9] / ~1.5k /` and `s/ [45][0-9][0-9] / ~½k /` or something like that. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:38 +02:00
Derrick Stolee	efc2210f48	survey: add ability to track prioritized lists In future changes, we will make use of these methods. The intention is to keep track of the top contributors according to some metric. We don't want to store all of the entries and do a sort at the end, so track a constant-size table and remove rows that get pushed out depending on the chosen sorting algorithm. Co-authored-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by; Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:38 +02:00
Derrick Stolee	acf05fc249	pack-objects: thread the path-based compression Adapting the implementation of ll_find_deltas(), create a threaded version of the --path-walk compression step in 'git pack-objects'. This involves adding a 'regions' member to the thread_params struct, allowing each thread to own a section of paths. We can simplify the way jobs are split because there is no value in extending the batch based on name-hash the way sections of the object entry array are attempted to be grouped. We re-use the 'list_size' and 'remaining' items for the purpose of borrowing work in progress from other "victim" threads when a thread has finished its batch of work more quickly. Using the Git repository as a test repo, the p5313 performance test shows that the resulting size of the repo is the same, but the threaded implementation gives gains of varying degrees depending on the number of objects being packed. (This was tested on a 16-core machine.) Test HEAD~1 HEAD ------------------------------------------------------------- 5313.6: thin pack with --path-walk 0.01 0.01 +0.0% 5313.7: thin pack size with --path-walk 475 475 +0.0% 5313.12: big pack with --path-walk 1.99 1.87 -6.0% 5313.13: big pack size with --path-walk 14.4M 14.3M -0.4% 5313.18: repack with --path-walk 98.14 41.46 -57.8% 5313.19: repack size with --path-walk 197.2M 197.3M +0.0% Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	a2cb463a8c	survey: show progress during object walk Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	2c4dc79abe	pack-objects: refactor path-walk delta phase Previously, the --path-walk option to 'git pack-objects' would compute deltas inline with the path-walk logic. This would make the progress indicator look like it is taking a long time to enumerate objects, and then very quickly computed deltas. Instead of computing deltas on each region of objects organized by tree, store a list of regions corresponding to these groups. These can later be pulled from the list for delta compression before doing the "global" delta search. This presents a new progress indicator that can be used in tests to verify that this stage is happening. The current implementation is not integrated with threads, but could be done in a future update. Since we do not attempt to sort objects by size until after exploring all trees, we can remove the previous change to t5530 due to a different error message appearing first. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	53ec7ac7c8	survey: summarize total sizes by object type Now that we have explored objects by count, we can expand that a bit more to summarize the data for the on-disk and inflated size of those objects. This information is helpful for diagnosing both why disk space (and perhaps clone or fetch times) is growing but also why certain operations are slow because the inflated size of the abstract objects that must be processed is so large. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	7e2adefac9	survey: add object count summary At the moment, nothing is obvious about the reason for the use of the path-walk API, but this will become more prevelant in future iterations. For now, use the path-walk API to sum up the counts of each kind of object. For example, this is the reachable object summary output for my local repo: REACHABLE OBJECT SUMMARY ======================== Object Type \| Count ------------+------- Tags \| 1343 Commits \| 179344 Trees \| 314350 Blobs \| 184030 Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	423991f7ef	pack-objects: enable --path-walk via config Users may want to enable the --path-walk option for 'git pack-objects' by default, especially underneath commands like 'git push' or 'git repack'. This should be limited to client repositories, since the --path-walk option disables bitmap walks, so would be bad to include in Git servers when serving fetches and clones. There is potential that it may be helpful to consider when repacking the repository, to take advantage of improved deltas across historical versions of the same files. Much like how "pack.useSparse" was introduced and included in "feature.experimental" before being enabled by default, use the repository settings infrastructure to make the new "pack.usePathWalk" config enabled by "feature.experimental" and "feature.manyFiles". Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	f16938703a	survey: start pretty printing data in table form When 'git survey' provides information to the user, this will be presented in one of two formats: plaintext and JSON. The JSON implementation will be delayed until the functionality is complete for the plaintext format. The most important parts of the plaintext format are headers specifying the different sections of the report and tables providing concreted data. Create a custom table data structure that allows specifying a list of strings for the row values. When printing the table, check each column for the maximum width so we can create a table of the correct size from the start. The table structure is designed to be flexible to the different kinds of output that will be implemented in future changes. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	29fe9781d5	repack: add --path-walk option Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. Add the --path-walk option to the performance tests in p5313. For the microsoft/fluentui repo [1] checked out at a specific commit [2], the results are very interesting: Test this tree ------------------------------------------------------------------ 5313.2: thin pack 0.40(0.47+0.04) 5313.3: thin pack size 1.2M 5313.4: thin pack with --full-name-hash 0.09(0.10+0.04) 5313.5: thin pack size with --full-name-hash 22.8K 5313.6: thin pack with --path-walk 0.08(0.06+0.02) 5313.7: thin pack size with --path-walk 20.8K 5313.8: big pack 2.16(8.43+0.23) 5313.9: big pack size 17.7M 5313.10: big pack with --full-name-hash 1.42(3.06+0.21) 5313.11: big pack size with --full-name-hash 18.0M 5313.12: big pack with --path-walk 2.21(8.39+0.24) 5313.13: big pack size with --path-walk 17.8M 5313.14: repack 98.05(662.37+2.64) 5313.15: repack size 449.1K 5313.16: repack with --full-name-hash 33.95(129.44+2.63) 5313.17: repack size with --full-name-hash 182.9K 5313.18: repack with --path-walk 106.21(121.58+0.82) 5313.19: repack size with --path-walk 159.6K [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 This repo suffers from having a lot of paths that collide in the name hash, so examining them in groups by path leads to better deltas. Also, in this case, the single-threaded implementation is competitive with the full repack. This is saving time diffing files that have significant differences from each other. A similar, but private, repo has even more extremes in the thin packs: Test this tree -------------------------------------------------------------- 5313.2: thin pack 2.39(2.91+0.10) 5313.3: thin pack size 4.5M 5313.4: thin pack with --full-name-hash 0.29(0.47+0.12) 5313.5: thin pack size with --full-name-hash 15.5K 5313.6: thin pack with --path-walk 0.35(0.31+0.04) 5313.7: thin pack size with --path-walk 14.2K Notice, however, that while the --full-name-hash version is working quite well in these cases for the thin pack, it does poorly for some other standard cases, such as this test on the Linux kernel repository: Test this tree -------------------------------------------------------------- 5313.2: thin pack 0.01(0.00+0.00) 5313.3: thin pack size 310 5313.4: thin pack with --full-name-hash 0.00(0.00+0.00) 5313.5: thin pack size with --full-name-hash 1.4K 5313.6: thin pack with --path-walk 0.00(0.00+0.00) 5313.7: thin pack size with --path-walk 310 Here, the --full-name-hash option does much worse than the default name hash, but the path-walk option does exactly as well. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Jeff Hostetler	ac3f9ab3fb	survey: add command line opts to select references By default we will scan all references in "refs/heads/", "refs/tags/" and "refs/remotes/". Add command line opts let the use ask for all refs or a subset of them and to include a detached HEAD. Signed-off-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	8540080e83	pack-objects: introduce GIT_TEST_PACK_PATH_WALK There are many tests that validate whether 'git pack-objects' works as expected. Instead of duplicating these tests, add a new test environment variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default when specified. This was useful in testing the implementation of the --path-walk implementation, especially in conjunction with test such as: - t0411-clone-from-partial.sh : One test fetches from a repo that does not have the boundary objects. This causes the path-based walk to fail. Disable the variable for this test. - t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo without a boundary object. - t5310-pack-bitmaps.sh : One test compares the case when packing with bitmaps to the case when packing without them. Since we disable the test variable when writing bitmaps, this causes a difference in the object list (the --path-walk option adds an extra object). Specify --no-path-walk in both processes for the comparison. Another test checks for a specific delta base, but when computing dynamically without using bitmaps, the base object it too small to be considered in the delta calculations so no base is used. - t5316-pack-delta-depth.sh : This script cares about certain delta choices and their chain lengths. The --path-walk option changes how these chains are selected, and thus changes the results of this test. - t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of the --sparse option and how it combines with --path-walk. - t5332-multi-pack-reuse.sh : This test verifies that the preferred pack is used for delta reuse when possible. The --path-walk option is not currently aware of the preferred pack at all, so finds a different delta base. - t7406-submodule-update.sh : When using the variable, the --depth option collides with the --path-walk feature, resulting in a warning message. Disable the variable so this warning does not appear. I want to call out one specific test change that is only temporary: - t5530-upload-pack-error.sh : One test cares specifically about an "unable to read" error message. Since the current implementation performs delta calculations within the path-walk API callback, a different "unable to get size" error message appears. When this is changed in a future refactoring, this test change can be reverted. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Jeff Hostetler	d78963f02c	survey: stub in new experimental 'git-survey' command Start work on a new 'git survey' command to scan the repository for monorepo performance and scaling problems. The goal is to measure the various known "dimensions of scale" and serve as a foundation for adding additional measurements as we learn more about Git monorepo scaling problems. The initial goal is to complement the scanning and analysis performed by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool. It is hoped that by creating a builtin command, we may be able to take advantage of internal Git data structures and code that is not accessible from GO to gain further insight into potential scaling problems. Co-authored-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	bc7cad71af	pack-objects: add --path-walk option In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. RFC TODO: It is important to note that this option is inherently incompatible with using a bitmap index. This walk probably also does not work with other advanced features, such as delta islands. Getting ahead of myself, this option compares well with --full-name-hash when the packfile is large enough, but also performs at least as well as the default in all cases that I've seen. RFC TODO: this should probably be recording the batch locations to another list so they could be processed in a second phase using threads. RFC TODO: list some examples of how this outperforms previous pack-objects strategies. (This is coming in later commits that include performance test changes.) Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:37 +02:00
Derrick Stolee	df0e2e37aa	pack-objects: extract should_attempt_deltas() This will be helpful in a future change. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2025-06-16 20:51:36 +02:00
Johannes Schindelin	0f81fe57a7	clean: remove mount points when possible Windows' equivalent to "bind mounts", NTFS junction points, can be unlinked without affecting the mount target. This is clearly what users expect to happen when they call `git clean -dfx` in a worktree that contains NTFS junction points: the junction should be removed, and the target directory of said junction should be left alone (unless it is inside the worktree). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:28 +02:00
Johannes Schindelin	9723231029	clean: do not traverse mount points It seems to be not exactly rare on Windows to install NTFS junction points (the equivalent of "bind mounts" on Linux/Unix) in worktrees, e.g. to map some development tools into a subdirectory. In such a scenario, it is pretty horrible if `git clean -dfx` traverses into the mapped directory and starts to "clean up". Let's just not do that. Let's make sure before we traverse into a directory that it is not a mount point (or junction). This addresses https://github.com/git-for-windows/git/issues/607 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2025-06-16 20:51:28 +02:00
Junio C Hamano	d9a1e51c76	Merge branch 'bs/total-ram-bsd' Update total_ram() functrion on BSD variants. * bs/total-ram-bsd: builtin/gc: correct physical memory detection for OpenBSD / NetBSD	2025-06-03 08:55:24 -07:00
Brad Smith	35c1d592cd	builtin/gc: correct physical memory detection for OpenBSD / NetBSD OpenBSD / NetBSD use HW_PHYSMEM64 to detect the amount of physical memory in a system. HW_PHYSMEM will not provide the correct amount on a system with >=4GB of memory. Signed-off-by: Brad Smith <brad@comstyle.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-06-01 19:01:07 -07:00
Junio C Hamano	0b4c6baa70	fast-export: --signed-commits is experimental As the design of signature handling is still being discussed, it is likely that the data stream produced by the code in Git 2.50 would have to be changed in such a way that is not backward compatible. Mark the feature as experimental and discourge its use for now. Also flip the default on the generation side to "strip"; users of existing versions would not have passed --signed-commits=strip and will be broken by this change if the default is made to abort, and will be encouraged by the error message to produce data stream with future breakage guarantees by passing --signed-commits option. As we tone down the default behaviour, we no longer need the FAST_EXPORT_SIGNED_COMMITS_NOABORT environment variable, which was not discoverable enough. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-28 10:30:47 -07:00
Junio C Hamano	b4847a4477	Merge branch 'jt/receive-pack-skip-connectivity-check' "git receive-pack" optionally learns not to care about connectivity check, which can be useful when the repository arranges to ensure connectivity by some other means. * jt/receive-pack-skip-connectivity-check: builtin/receive-pack: add option to skip connectivity check t5410: test receive-pack connectivity check	2025-05-28 07:59:56 -07:00
Junio C Hamano	f9cdaa2860	Merge branch 'js/misc-fixes' Assorted fixes for issues found with CodeQL. * js/misc-fixes: sequencer: stop pretending that an assignment is a condition bundle-uri: avoid using undefined output of `sscanf()` commit-graph: avoid using stale stack addresses trace2: avoid "futile conditional" Avoid redundant conditions fetch: avoid unnecessary work when there is no current branch has_dir_name(): make code more obvious upload-pack: rename `enum` to reflect the operation commit-graph: avoid malloc'ing a local variable fetch: carefully clear local variable's address after use commit: simplify code	2025-05-27 13:59:11 -07:00
Junio C Hamano	6e5fb398d3	Merge branch 'ds/sparse-apply-add-p' "git apply" and "git add -i/-p" code paths no longer unnecessarily expand sparse-index while working. * ds/sparse-apply-add-p: p2000: add performance test for patch-mode commands reset: integrate sparse index with --patch git add: make -p/-i aware of sparse index apply: integrate with the sparse index	2025-05-27 13:59:09 -07:00
Junio C Hamano	f545f401be	Merge branch 'en/merge-tree-check' "git merge-tree" learned an option to see if it resolves cleanly without actually creating a result. * en/merge-tree-check: merge-tree: add a new --quiet flag merge-ort: add a new mergeability_only option	2025-05-27 13:59:08 -07:00
Junio C Hamano	17d9dbd3c2	Merge branch 'jk/no-funny-object-types' Support to create a loose object file with unknown object type has been dropped. * jk/no-funny-object-types: object-file: drop support for writing objects with unknown types hash-object: handle --literally with OPT_NEGBIT hash-object: merge HASH_* and INDEX_* flags hash-object: stop allowing unknown types t: add lib-loose.sh t/helper: add zlib test-tool oid_object_info(): drop type_name strbuf fsck: stop using object_info->type_name strbuf oid_object_info_convert(): stop using string for object type cat-file: use type enum instead of buffer for -t option object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag cat-file: make --allow-unknown-type a noop object-file.h: fix typo in variable declaration	2025-05-27 13:59:08 -07:00
Junio C Hamano	96d127896d	Merge branch 'en/replay-wo-the-repository' The dependency on the_repository variable has been reduced from the code paths in "git replay". * en/replay-wo-the-repository: replay: replace the_repository with repo parameter passed to cmd_replay ()	2025-05-23 15:34:08 -07:00
Justin Tobler	68cb0b5253	builtin/receive-pack: add option to skip connectivity check During git-receive-pack(1), connectivity of the object graph is validated to ensure that the received packfile does not leave the repository in a broken state. This is done via git-rev-list(1) and walking the objects, which can be expensive for large repositories. Generally, this check is critical to avoid an incomplete received packfile from corrupting a repository. Server operators may have additional knowledge though around exactly how Git is being used on the server-side which can be used to facilitate more efficient connectivity computation of incoming objects. For example, if it can be ensured that all objects in a repository are connected and do not depend on any missing objects, the connectivity of newly written objects can be checked by walking the object graph containing only the new objects from the updated tips and identifying the missing objects which represent the boundary between the new objects and the repository. These boundary objects can be checked in the canonical repository to ensure the new objects connect as expected and thus avoid walking the rest of the object graph. Git itself cannot make the guarantees required for such an optimization as it is possible for a repository to contain an unreachable object that references a missing object without the repository being considered corrupt. Introduce the --skip-connectivity-check option for git-receive-pack(1) which bypasses this connectivity check to give more control to the server-side. Note that without proper server-side validation of newly received objects handled outside of Git, usage of this option risks corrupting a repository. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-20 11:43:36 -07:00
Junio C Hamano	a9dcacbf2a	Merge branch 'jk/oidmap-cleanup' Code cleanup. * jk/oidmap-cleanup: raw_object_store: drop extra pointer to replace_map oidmap: add size function oidmap: rename oidmap_free() to oidmap_clear()	2025-05-19 16:02:47 -07:00
Junio C Hamano	6660b42929	Merge branch 'ly/am-split-stgit-leakfix' Leakfix. * ly/am-split-stgit-leakfix: builtin/am: fix memory leak in `split_mail_stgit_series`	2025-05-19 16:02:46 -07:00
Elijah Newren	29d7bf1951	merge-tree: add a new --quiet flag Git Forges may be interested in whether two branches can be merged while not being interested in what the resulting merge tree is nor which files conflicted. For such cases, add a new --quiet flag which will make use of the new mergeability_only flag added to merge-ort in the previous commit. This option allows the merge machinery to, in the outer layer of the merge: * exit early when a conflict is detected * avoid writing (most) merged blobs/trees to the object store Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 15:09:14 -07:00
Derrick Stolee	efab7dc1f4	reset: integrate sparse index with --patch Similar to the previous change for 'git add -p', the reset builtin checked for integration with the sparse index after possibly redirecting its logic toward the interactive logic. This means that the builtin would expand the sparse index to a full one upon read. Move this check earlier within cmd_reset() to improve performance here. Add tests to guarantee that we are not universally expanding the index. Add behavior tests to check that we are doing the same operations as a full index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:02:47 -07:00
Derrick Stolee	02ed8555f6	git add: make -p/-i aware of sparse index It is slow to expand a sparse index in-memory due to parsing of trees. We aim to minimize that performance cost when possible. 'git add -p' uses 'git apply' child processes to modify the index, but still there are some expansions that occur. It turns out that control flows out of cmd_add() in the interactive cases before the lines that confirm that the builtin is integrated with the sparse index. Moving that integration point earlier in cmd_add() allows 'git add -i' and 'git add -p' to operate without expanding a sparse index to a full one. Add test cases that confirm that these interactive add options work with the sparse index. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:01:51 -07:00
Derrick Stolee	952de281fe	apply: integrate with the sparse index The sparse index allows storing directory entries in the index, marked with the skip-wortkree bit and pointing to a tree object. This may be an unexpected data shape for some implementation areas, so we are rolling it out incrementally on a builtin-per-builtin basis. This change enables the sparse index for 'git apply'. The main motivation for this change is that 'git apply' is used as a child process of 'git add -p' and expanding the sparse index for each of those child processes can lead to significant performance issues. The good news is that the actual index manipulation code used by 'git apply' is already integrated with the sparse index, so the only product change is to mark the builtin as allowing the sparse index so it isn't inflated on read. The more involved part of this change is around adding tests that verify how 'git apply' behaves in a sparse-checkout environment and whether or not the index expands in certain operations. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 12:00:33 -07:00
Jeff King	f710fd7b49	hash-object: handle --literally with OPT_NEGBIT Since we recently removed the hash_literally() function, the hash-object --literally option has been simplified to just removing the INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool, we can just have the option parser remove the bit from the set of flags directly. This simplifies the helper functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2025-05-16 09:43:11 -07:00

1 2 3 4 5 ...

12786 Commits