git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-03-21 08:23:24 -05:00

Author	SHA1	Message	Date
Karsten Blees	79a3c8a2a2	mingw: support long paths Windows paths are typically limited to MAX_PATH = 260 characters, even though the underlying NTFS file system supports paths up to 32,767 chars. This limitation is also evident in Windows Explorer, cmd.exe and many other applications (including IDEs). Particularly annoying is that most Windows APIs return bogus error codes if a relative path only barely exceeds MAX_PATH in conjunction with the current directory, e.g. ERROR_PATH_NOT_FOUND / ENOENT instead of the infinitely more helpful ERROR_FILENAME_EXCED_RANGE / ENAMETOOLONG. Many Windows wide char APIs support longer than MAX_PATH paths through the file namespace prefix ('\\?\' or '\\?\UNC\') followed by an absolute path. Notable exceptions include functions dealing with executables and the current directory (CreateProcess, LoadLibrary, Get/SetCurrentDirectory) as well as the entire shell API (ShellExecute, SHGetSpecialFolderPath...). Introduce a handle_long_path function to check the length of a specified path properly (and fail with ENAMETOOLONG), and to optionally expand long paths using the '\\?\' file namespace prefix. Short paths will not be modified, so we don't need to worry about device names (NUL, CON, AUX). Contrary to MSDN docs, the GetFullPathNameW function doesn't seem to be limited to MAX_PATH (at least not on Win7), so we can use it to do the heavy lifting of the conversion (translate '/' to '\', eliminate '.' and '..', and make an absolute path). Add long path error checking to xutftowcs_path for APIs with hard MAX_PATH limit. Add a new MAX_LONG_PATH constant and xutftowcs_long_path function for APIs that support long paths. While improved error checking is always active, long paths support must be explicitly enabled via 'core.longpaths' option. This is to prevent end users to shoot themselves in the foot by checking out files that Windows Explorer, cmd/bash or their favorite IDE cannot handle. Test suite: Test the case is when the full pathname length of a dir is close to 260 (MAX_PATH). Bug report and an original reproducer by Andrey Rogozhnikov: https://github.com/msysgit/git/pull/122#issuecomment-43604199 [jes: adjusted test number to avoid conflicts, added support for chdir(), etc] Thanks-to: Martin W. Kirst <maki@bitkings.de> Thanks-to: Doug Kelly <dougk.ff7@gmail.com> Original-test-by: Andrey Rogozhnikov <rogozhnikov.andrey@gmail.com> Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Stepan Kasal <kasal@ucw.cz> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2024-12-17 12:04:23 +01:00
Karsten Blees	63400a85cf	mingw: add infrastructure for read-only file system level caches Add a macro to mark code sections that only read from the file system, along with a config option and documentation. This facilitates implementation of relatively simple file system level caches without the need to synchronize with the file system. Enable read-only sections for 'git status' and preload_index. Signed-off-by: Karsten Blees <blees@dcon.de>	2024-12-17 12:04:12 +01:00
Johannes Schindelin	ca318aa4a7	Add experimental 'git survey' builtin (#5174 ) This introduces `git survey` to Git for Windows ahead of upstream for the express purpose of getting the path-based analysis in the hands of more folks. The inspiration of this builtin is [`git-sizer`](https://github.com/github/git-sizer), but since that command relies on `git cat-file --batch` to get the contents of objects, it has limits to how much information it can provide. This is mostly a rewrite of the `git survey` builtin that was introduced into the `microsoft/git` fork in microsoft/git#667. That version had a lot more bells and whistles, including an analysis much closer to what `git-sizer` provides. The biggest difference in this version is that this one is focused on using the path-walk API in order to visit batches of objects based on a common path. This allows identifying, for instance, the path that is contributing the most to the on-disk size across all versions at that path. For example, here are the top ten paths contributing to my local Git repository (which includes `microsoft/git` and `gitster/git`): ``` TOP FILES BY DISK SIZE ============================================================================ Path \| Count \| Disk Size \| Inflated Size -----------------------------------------+-------+-----------+-------------- whats-cooking.txt \| 1373 \| 11637459 \| 37226854 t/helper/test-gvfs-protocol \| 2 \| 6847105 \| 17233072 git-rebase--helper \| 1 \| 6027849 \| 15269664 compat/mingw.c \| 6111 \| 5194453 \| 463466970 t/helper/test-parse-options \| 1 \| 3420385 \| 8807968 t/helper/test-pkt-line \| 1 \| 3408661 \| 8778960 t/helper/test-dump-untracked-cache \| 1 \| 3408645 \| 8780816 t/helper/test-dump-fsmonitor \| 1 \| 3406639 \| 8776656 po/vi.po \| 104 \| 1376337 \| 51441603 po/de.po \| 210 \| 1360112 \| 71198603 ``` This kind of analysis has been helpful in identifying the reasons for growth in a few internal monorepos. Those findings motivated the changes in #5157 and #5171. With this early version in Git for Windows, we can expand the reach of the experimental tool in advance of it being contributed to the upstream project. Unfortunately, this will mean that in the next `microsoft/git` rebase, @jeffhostetler's version will need to be pulled out since there are enough conflicts. These conflicts include how tables are stored and generated, as the version in this PR is slightly more general to allow for different kinds of data.	2024-12-17 12:04:02 +01:00
Johannes Schindelin	994196d4b4	Introduce 'git backfill' to get missing blobs in a partial clone (#5172 ) This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.	2024-12-17 12:04:02 +01:00
Derrick Stolee	fcf6f6f6d5	Add path walk API and its use in 'git pack-objects' (#5171 ) This is a follow up to #5157 as well as motivated by the RFC in gitgitgadget/git#1786. We have ways of walking all objects, but it is focused on visiting a single commit and then expanding the new trees and blobs reachable from that commit that have not been visited yet. This means that objects arrive without any locality based on their path. Add a new "path walk API" that focuses on walking objects in batches according to their type and path. This will walk all annotated tags, all commits, all root trees, and then start a depth-first search among all paths in the repo to collect trees and blobs in batches. The most important application for this is being fast-tracked to Git for Windows: `git pack-objects --path-walk`. This application of the path walk API discovers the objects to pack via this batched walk, and automatically groups objects that appear at a common path so they can be checked for delta comparisons. This use completely avoids any name-hash collisions (even the collisions that sometimes occur with the new `--full-name-hash` option) and can be much faster to compute since the first pass of delta calculations does not waste time on objects that are unlikely to be diffable. Some statistics are available in the commit messages.	2024-12-17 12:04:01 +01:00
Johannes Schindelin	1775126925	pack-objects: create new name-hash algorithm (#5157 ) This is an updated version of gitgitgadget/git#1785, intended for early consumption into Git for Windows. The idea here is to add a new `--full-name-hash` option to `git pack-objects` and `git repack`. This adjusts the name-hash value used for finding delta bases in such a way that uses the full path name with a lower likelihood of collisions than the default name-hash algorithm. In many repositories with name-hash collisions and many versions of those paths, this can significantly reduce the size of a full repack. It can also help in certain cases of `git push`, but only if the pack is already artificially inflated by name-hash collisions; cases that find "sibling" deltas as better choices become worse with `--full-name-hash`. Thus, this option is currently recommended for full repacks of large repos, and on client machines without reachability bitmaps. Some care is taken to ignore this option when using bitmaps, either writing bitmaps or using a bitmap walk during reads. The bitmap file format contains name-hash values, but no way to indicate which function is used, so compatibility is a concern for bitmaps. Future work could explore this idea. After this PR is merged, then the more-involved `--path-walk` option may be considered.	2024-12-17 12:04:01 +01:00
Johannes Schindelin	1cccfc6247	Merge branch 'optionally-dont-append-atomically-on-windows' Fix append failure issue under remote directories #2753 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2024-12-17 12:03:45 +01:00
Johannes Schindelin	1ec408e0c2	Merge pull request #3293 from pascalmuller/http-support-automatically-sending-client-certificate http: Add support for enabling automatic sending of SSL client certificate	2024-12-17 12:03:43 +01:00
Johannes Schindelin	2d57219f04	Merge pull request #2535 from dscho/schannel-revoke-best-effort Introduce and use the new "best effort" strategy for Secure Channel revoke checking	2024-12-17 12:03:34 +01:00
Johannes Schindelin	bb0476a282	backfill: mark it as experimental This is a highly useful command, and we want it to get some testing "in the wild". However, the patches have not yet been reviewed on the Git mailing list, and are therefore subject to change. By marking the command as experimental, users will be warned to pay attention to those changes. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2024-12-17 00:44:39 +01:00
Derrick Stolee	57b7bc9e54	survey: add --top=<N> option and config The 'git survey' builtin provides several detail tables, such as "top files by on-disk size". The size of these tables defaults to 10, currently. Allow the user to specify this number via a new --top=<N> option or the new survey.top config key. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2024-12-17 00:44:39 +01:00
Derrick Stolee	ccec549a8e	backfill: assume --sparse when sparse-checkout is enabled The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:39 +01:00
Derrick Stolee	eca47f32f3	backfill: add --sparse option One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensie as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:39 +01:00
Derrick Stolee	f9b6169805	survey: add object count summary At the moment, nothing is obvious about the reason for the use of the path-walk API, but this will become more prevelant in future iterations. For now, use the path-walk API to sum up the counts of each kind of object. For example, this is the reachable object summary output for my local repo: REACHABLE OBJECT SUMMARY ======================== Object Type \| Count ------------+------- Tags \| 1343 Commits \| 179344 Trees \| 314350 Blobs \| 184030 Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:39 +01:00
Derrick Stolee	376f5cc5e9	survey: start pretty printing data in table form When 'git survey' provides information to the user, this will be presented in one of two formats: plaintext and JSON. The JSON implementation will be delayed until the functionality is complete for the plaintext format. The most important parts of the plaintext format are headers specifying the different sections of the report and tables providing concreted data. Create a custom table data structure that allows specifying a list of strings for the row values. When printing the table, check each column for the maximum width so we can create a table of the correct size from the start. The table structure is designed to be flexible to the different kinds of output that will be implemented in future changes. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:39 +01:00
Jeff Hostetler	97951c0e77	survey: add command line opts to select references By default we will scan all references in "refs/heads/", "refs/tags/" and "refs/remotes/". Add command line opts let the use ask for all refs or a subset of them and to include a detached HEAD. Signed-off-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:39 +01:00
Jeff Hostetler	38cc6ef4d7	survey: stub in new experimental 'git-survey' command Start work on a new 'git survey' command to scan the repository for monorepo performance and scaling problems. The goal is to measure the various known "dimensions of scale" and serve as a foundation for adding additional measurements as we learn more about Git monorepo scaling problems. The initial goal is to complement the scanning and analysis performed by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool. It is hoped that by creating a builtin command, we may be able to take advantage of internal Git data structures and code that is not accessible from GO to gain further insight into potential scaling problems. Co-authored-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:39 +01:00
Derrick Stolee	a4b3f64914	backfill: add --batch-size=<n> option Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:14 +01:00
Derrick Stolee	cf287bf971	backfill: basic functionality and tests The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:44:13 +01:00
Derrick Stolee	92eaef30a7	backfill: add builtin boilerplate In anticipation of implementing 'git backfill', populate the necessary files with the boilerplate of a new builtin. RFC TODO: When preparing this for a full implementation, make sure it is based on the newest standards introduced by [1]. [1] https://lore.kernel.org/git/xmqqjzfq2f0f.fsf@gitster.g/T/#m606036ea2e75a6d6819d6b5c90e729643b0ff7f7 [PATCH 1/3] builtin: add a repository parameter for builtin functions Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:41 +01:00
Derrick Stolee	ef891947f0	pack-objects: enable --path-walk via config Users may want to enable the --path-walk option for 'git pack-objects' by default, especially underneath commands like 'git push' or 'git repack'. This should be limited to client repositories, since the --path-walk option disables bitmap walks, so would be bad to include in Git servers when serving fetches and clones. There is potential that it may be helpful to consider when repacking the repository, to take advantage of improved deltas across historical versions of the same files. Much like how "pack.useSparse" was introduced and included in "feature.experimental" before being enabled by default, use the repository settings infrastructure to make the new "pack.usePathWalk" config enabled by "feature.experimental" and "feature.manyFiles". Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:40 +01:00
Derrick Stolee	1e7dbcb040	repack: add --path-walk option Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. Add the --path-walk option to the performance tests in p5313. For the microsoft/fluentui repo [1] checked out at a specific commit [2], the results are very interesting: Test this tree ------------------------------------------------------------------ 5313.2: thin pack 0.40(0.47+0.04) 5313.3: thin pack size 1.2M 5313.4: thin pack with --full-name-hash 0.09(0.10+0.04) 5313.5: thin pack size with --full-name-hash 22.8K 5313.6: thin pack with --path-walk 0.08(0.06+0.02) 5313.7: thin pack size with --path-walk 20.8K 5313.8: big pack 2.16(8.43+0.23) 5313.9: big pack size 17.7M 5313.10: big pack with --full-name-hash 1.42(3.06+0.21) 5313.11: big pack size with --full-name-hash 18.0M 5313.12: big pack with --path-walk 2.21(8.39+0.24) 5313.13: big pack size with --path-walk 17.8M 5313.14: repack 98.05(662.37+2.64) 5313.15: repack size 449.1K 5313.16: repack with --full-name-hash 33.95(129.44+2.63) 5313.17: repack size with --full-name-hash 182.9K 5313.18: repack with --path-walk 106.21(121.58+0.82) 5313.19: repack size with --path-walk 159.6K [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 This repo suffers from having a lot of paths that collide in the name hash, so examining them in groups by path leads to better deltas. Also, in this case, the single-threaded implementation is competitive with the full repack. This is saving time diffing files that have significant differences from each other. A similar, but private, repo has even more extremes in the thin packs: Test this tree -------------------------------------------------------------- 5313.2: thin pack 2.39(2.91+0.10) 5313.3: thin pack size 4.5M 5313.4: thin pack with --full-name-hash 0.29(0.47+0.12) 5313.5: thin pack size with --full-name-hash 15.5K 5313.6: thin pack with --path-walk 0.35(0.31+0.04) 5313.7: thin pack size with --path-walk 14.2K Notice, however, that while the --full-name-hash version is working quite well in these cases for the thin pack, it does poorly for some other standard cases, such as this test on the Linux kernel repository: Test this tree -------------------------------------------------------------- 5313.2: thin pack 0.01(0.00+0.00) 5313.3: thin pack size 310 5313.4: thin pack with --full-name-hash 0.00(0.00+0.00) 5313.5: thin pack size with --full-name-hash 1.4K 5313.6: thin pack with --path-walk 0.00(0.00+0.00) 5313.7: thin pack size with --path-walk 310 Here, the --full-name-hash option does much worse than the default name hash, but the path-walk option does exactly as well. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:40 +01:00
Derrick Stolee	1f0ee5062a	pack-objects: add --path-walk option In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. RFC TODO: It is important to note that this option is inherently incompatible with using a bitmap index. This walk probably also does not work with other advanced features, such as delta islands. Getting ahead of myself, this option compares well with --full-name-hash when the packfile is large enough, but also performs at least as well as the default in all cases that I've seen. RFC TODO: this should probably be recording the batch locations to another list so they could be processed in a second phase using threads. RFC TODO: list some examples of how this outperforms previous pack-objects strategies. (This is coming in later commits that include performance test changes.) Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:40 +01:00
Derrick Stolee	a928048f6f	path-walk: add prune_all_uninteresting option This option causes the path-walk API to act like the sparse tree-walk algorithm implemented by mark_trees_uninteresting_sparse() in list-objects.c. Starting from the commits marked as UNINTERESTING, their root trees and all objects reachable from those trees are UNINTERSTING, at least as we walk path-by-path. When we reach a path where all objects associated with that path are marked UNINTERESTING, then do no continue walking the children of that path. We need to be careful to pass the UNINTERESTING flag in a deep way on the UNINTERESTING objects before we start the path-walk, or else the depth-first search for the path-walk API may accidentally report some objects as interesting. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:40 +01:00
Derrick Stolee	f3372d6c99	path-walk: allow visiting tags In anticipation of using the path-walk API to analyze tags or include them in a pack-file, add the ability to walk the tags that were included in the revision walk. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:40 +01:00
Derrick Stolee	1c48b5f772	path-walk: allow consumer to specify object types We add the ability to filter the object types in the path-walk API so the callback function is called fewer times. This adds the ability to ask for the commits in a list, as well. Future changes will add the ability to visit annotated tags. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:40 +01:00
Derrick Stolee	51143f56d1	t6601: add helper for testing path-walk API Add some tests based on the current behavior, doing interesting checks for different sets of branches, ranges, and the --boundary option. This sets a baseline for the behavior and we can extend it as new options are introduced. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-17 00:43:38 +01:00
Derrick Stolee	dd1838d192	path-walk: introduce an object walk by path In anticipation of a few planned applications, introduce the most basic form of a path-walk API. It currently assumes that there are no UNINTERESTING objects, and does not include any complicated filters. It calls a function pointer on groups of tree and blob objects as grouped by path. This only includes objects the first time they are discovered, so an object that appears at multiple paths will not be included in two batches. There are many future adaptations that could be made, but they are left for future updates when consumers are ready to take advantage of those features. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-16 21:04:46 +01:00
Derrick Stolee	97ac379701	git-repack: update usage to match docs This also adds the '--full-name-hash' option introduced in the previous change and adds newlines to the synopsis. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-16 21:04:46 +01:00
Derrick Stolee	45712e6a79	pack-objects: add --full-name-hash option The pack_name_hash() method has not been materially changed since it was introduced in `ce0bd64299` (pack-objects: improve path grouping heuristics., 2006-06-05). The intention here is to group objects by path name, but also attempt to group similar file types together by making the most-significant digits of the hash be focused on the final characters. Here's the crux of the implementation: /* * This effectively just creates a sortable number from the * last sixteen non-whitespace characters. Last characters * count "most", so things that end in ".c" sort together. / while ((c = name++) != 0) { if (isspace(c)) continue; hash = (hash >> 2) + (c << 24); } As the comment mentions, this only cares about the last sixteen non-whitespace characters. This cause some filenames to collide more than others. Here are some examples that I've seen while investigating repositories that are growing more than they should be: * "/CHANGELOG.json" is 15 characters, and is created by the beachball [1] tool. Only the final character of the parent directory can differntiate different versions of this file, but also only the two most-significant digits. If that character is a letter, then this is always a collision. Similar issues occur with the similar "/CHANGELOG.md" path, though there is more opportunity for differences in the parent directory. * Localization files frequently have common filenames but differentiate via parent directories. In C#, the name "/strings.resx.lcl" is used for these localization files and they will all collide in name-hash. [1] https://github.com/microsoft/beachball I've come across many other examples where some internal tool uses a common name across multiple directories and is causing Git to repack poorly due to name-hash collisions. It is clear that the existing name-hash algorithm is optimized for repositories with short path names, but also is optimized for packing a single snapshot of a repository, not a repository with many versions of the same file. In my testing, this has proven out where the name-hash algorithm does a good job of finding peer files as delta bases when unable to use a historical version of that exact file. However, for repositories that have many versions of most files and directories, it is more important that the objects that appear at the same path are grouped together. Create a new pack_full_name_hash() method and a new --full-name-hash option for 'git pack-objects' to call that method instead. Add a simple pass-through for 'git repack --full-name-hash' for additional testing in the context of a full repack, where I expect this will be most effective. The hash algorithm is as simple as possible to be reasonably effective: for each character of the path string, add a multiple of that character and a large prime number (chosen arbitrarily, but intended to be large relative to the size of a uint32_t). Then, shift the current hash value to the right by 5, with overlap. The addition and shift parameters are standard mechanisms for creating hard-to-predict behaviors in the bits of the resulting hash. This is not meant to be cryptographic at all, but uniformly distributed across the possible hash values. This creates a hash that appears pseudorandom. There is no ability to consider similar file types as being close to each other. In a later change, a test-tool will be added so the effectiveness of this hash can be demonstrated directly. For now, let's consider how effective this mechanism is when repacking a repository with and without the --full-name-hash option. Specifically, let's use 'git repack -adf [--full-name-hash]' as our test. On the Git repository, we do not expect much difference. All path names are short. This is backed by our results: \| Stage \| Pack Size \| Repack Time \| \|-----------------------\|-----------\|-------------\| \| After clone \| 260 MB \| N/A \| \| Standard Repack \| 127MB \| 106s \| \| With --full-name-hash \| 126 MB \| 99s \| This example demonstrates how there is some natural overhead coming from the cloned copy because the server is hosting many forks and has not optimized for exactly this set of reachable objects. But the full repack has similar characteristics with and without --full-name-hash. However, we can test this in a repository that uses one of the problematic naming conventions above. The fluentui [2] repo uses beachball to generate CHANGELOG.json and CHANGELOG.md files, and these files have very poor delta characteristics when comparing against versions across parent directories. \| Stage \| Pack Size \| Repack Time \| \|-----------------------\|-----------\|-------------\| \| After clone \| 694 MB \| N/A \| \| Standard Repack \| 438 MB \| 728s \| \| With --full-name-hash \| 168 MB \| 142s \| [2] https://github.com/microsoft/fluentui In this example, we see significant gains in the compressed packfile size as well as the time taken to compute the packfile. Using a collection of repositories that use the beachball tool, I was able to make similar comparisions with dramatic results. While the fluentui repo is public, the others are private so cannot be shared for reproduction. The results are so significant that I find it important to share here: \| Repo \| Standard Repack \| With --full-name-hash \| \|----------\|-----------------\|-----------------------\| \| fluentui \| 438 MB \| 168 MB \| \| Repo B \| 6,255 MB \| 829 MB \| \| Repo C \| 37,737 MB \| 7,125 MB \| \| Repo D \| 130,049 MB \| 6,190 MB \| Future changes could include making --full-name-hash implied by a config value or even implied by default during a full repack. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2024-12-16 21:04:46 +01:00
孙卓识	3b721e80b3	Add config option `windows.appendAtomically` Atomic append on windows is only supported on local disk files, and it may cause errors in other situations, e.g. network file system. If that is the case, this config option should be used to turn atomic append off. Co-Authored-By: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: 孙卓识 <sunzhuoshi@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2024-12-16 21:04:06 +01:00
Pascal Muller	eb5a4c2280	http: optionally send SSL client certificate This adds support for a new http.sslAutoClientCert config value. In cURL 7.77 or later the schannel backend does not automatically send client certificates from the Windows Certificate Store anymore. This config value is only used if http.sslBackend is set to "schannel", and can be used to opt in to the old behavior and force cURL to send client certificates. This fixes https://github.com/git-for-windows/git/issues/3292 Signed-off-by: Pascal Muller <pascalmuller@gmail.com>	2024-12-16 21:00:07 +01:00
Johannes Schindelin	a62c776360	http: use new "best effort" strategy for Secure Channel revoke checking The native Windows HTTPS backend is based on Secure Channel which lets the caller decide how to handle revocation checking problems caused by missing information in the certificate or offline CRL distribution points. Unfortunately, cURL chose to handle these problems differently than OpenSSL by default: while OpenSSL happily ignores those problems (essentially saying "¯\_(ツ)_/¯"), the Secure Channel backend will error out instead. As a remedy, the "no revoke" mode was introduced, which turns off revocation checking altogether. This is a bit heavy-handed. We support this via the `http.schannelCheckRevoke` setting. In https://github.com/curl/curl/pull/4981, we contributed an opt-in "best effort" strategy that emulates what OpenSSL seems to do. In Git for Windows, we actually want this to be the default. This patch makes it so, introducing it as a new value for the `http.schannelCheckRevoke" setting, which now becmes a tristate: it accepts the values "false", "true" or "best-effort" (defaulting to the last one). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2024-12-16 20:58:00 +01:00
Thomas Braun	f1ed35447a	transport: optionally disable side-band-64k Since commit `0c499ea60f` (send-pack: demultiplex a sideband stream with status data, 2010-02-05) the send-pack builtin uses the side-band-64k capability if advertised by the server. Unfortunately this breaks pushing over the dump git protocol if used over a network connection. The detailed reasons for this breakage are (by courtesy of Jeff Preshing, quoted from https://groups.google.com/d/msg/msysgit/at8D7J-h7mw/eaLujILGUWoJ): MinGW wraps Windows sockets in CRT file descriptors in order to mimic the functionality of POSIX sockets. This causes msvcrt.dll to treat sockets as Installable File System (IFS) handles, calling ReadFile, WriteFile, DuplicateHandle and CloseHandle on them. This approach works well in simple cases on recent versions of Windows, but does not support all usage patterns. In particular, using this approach, any attempt to read & write concurrently on the same socket (from one or more processes) will deadlock in a scenario where the read waits for a response from the server which is only invoked after the write. This is what send_pack currently attempts to do in the use_sideband codepath. The new config option `sendpack.sideband` allows to override the side-band-64k capability of the server, and thus makes the dumb git protocol work. Other transportation methods like ssh and http/https still benefit from the sideband channel, therefore the default value of `sendpack.sideband` is still true. Signed-off-by: Thomas Braun <thomas.braun@byte-physics.de> Signed-off-by: Oliver Schneider <oliver@assarbad.net> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2024-12-16 20:47:09 +01:00
Junio C Hamano	063bcebf0c	Git 2.48-rc0 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-16 08:54:04 -08:00
Junio C Hamano	29e5596eb8	Merge branch 'ps/build' Build procedure update plus introduction of Meson based builds. * ps/build: (24 commits) Introduce support for the Meson build system Documentation: add comparison of build systems t: allow overriding build dir t: better support for out-of-tree builds Documentation: extract script to generate a list of mergetools Documentation: teach "cmd-list.perl" about out-of-tree builds Documentation: allow sourcing generated includes from separate dir Makefile: simplify building of templates Makefile: write absolute program path into bin-wrappers Makefile: allow "bin-wrappers/" directory to exist Makefile: refactor generators to be PWD-independent Makefile: extract script to generate gitweb.js Makefile: extract script to generate gitweb.cgi Makefile: extract script to massage Python scripts Makefile: extract script to massage Shell scripts Makefile: use "generate-perl.sh" to massage Perl library Makefile: extract script to massage Perl scripts Makefile: consistently use PERL_PATH Makefile: generate doc versions via GIT-VERSION-GEN Makefile: generate "git.rc" via GIT-VERSION-GEN ...	2024-12-15 17:54:33 -08:00
Junio C Hamano	2ccc89b0c1	The sixteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-13 07:33:46 -08:00
Junio C Hamano	d0ddf344da	Merge branch 'kk/doc-ancestry-path' The --ancestry-path option is designed to be given a commit that is on the path, which was not documented, which has been corrected. * kk/doc-ancestry-path: doc: mention rev-list --ancestry-path restrictions	2024-12-13 07:33:46 -08:00
Junio C Hamano	3b11c9139d	Merge branch 'cw/worktree-extension' Introduce a new repository extension to prevent older Git versions from mis-interpreting worktrees created with relative paths. * cw/worktree-extension: worktree: refactor `repair_worktree_after_gitdir_move()` worktree: add relative cli/config options to `repair` command worktree: add relative cli/config options to `move` command worktree: add relative cli/config options to `add` command worktree: add `write_worktree_linking_files()` function worktree: refactor infer_backlink return worktree: add `relativeWorktrees` extension setup: correctly reinitialize repository version	2024-12-13 07:33:43 -08:00
Junio C Hamano	90bf05e45a	Merge branch 'kh/doc-update-ref-grammofix' Grammofix. * kh/doc-update-ref-grammofix: Documentation/git-update-ref.txt: add missing word	2024-12-13 07:33:39 -08:00
Junio C Hamano	1ddfe5acde	Merge branch 'kh/doc-bundle-typofix' Typofix. * kh/doc-bundle-typofix: Documentation/git-bundle.txt: fix word join typo	2024-12-13 07:33:38 -08:00
Junio C Hamano	5cbe030c86	Merge branch 'jc/doc-error-message-guidelines' Developer documentation update. * jc/doc-error-message-guidelines: CodingGuidelines: a handful of error message guidelines	2024-12-13 07:33:37 -08:00
Junio C Hamano	caacdb5dfd	The fifteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-10 10:04:58 +09:00
Junio C Hamano	35f40385e4	Merge branch 'bc/allow-upload-pack-from-other-people' Loosen overly strict ownership check introduced in the recent past, to keep the promise "cloning a suspicious repository is a safe first step to inspect it". * bc/allow-upload-pack-from-other-people: Allow cloning from repositories owned by another user	2024-12-10 10:04:55 +09:00
Junio C Hamano	bd31944dda	Merge branch 'jc/doc-opt-tilde-expand' Describe a case where an option value needs to be spelled as a separate argument, i.e. "--opt val", not "--opt=val". * jc/doc-opt-tilde-expand: doc: option value may be separate for valid reasons	2024-12-10 10:04:52 +09:00
Patrick Steinhardt	904339edbd	Introduce support for the Meson build system Introduce support for the Meson build system, a "modern" meta build system that supports many different platforms, including Linux, macOS, Windows and BSDs. Meson supports different backends, including Ninja, Xcode and Microsoft Visual Studio. Several common IDEs provide an integration with it. The biggest contender compared to Meson is probably CMake as outlined in our "Documentation/technical/build-systems.txt" file. Based on my own personal experience from working with both build systems extensively I strongly favor Meson over CMake. In my opinion, it feels significantly easier to use with a syntax that feels more like a "real" programming language. The second big reason is that Meson supports Rust natively, which may prove to be important given that the project may pick up Rust as another language eventually. Using Meson is rather straight-forward. An example: ``` # Meson uses out-of-tree builds. You can set up multiple build # directories, how you name them is completely up to you. $ mkdir build $ cd build $ meson setup .. -Dprefix=/tmp/git-installation # Build the project. This also provides several other targets like e.g. `install` or `test`. $ ninja # Meson has been wired up to support execution of our test suites. # Both our unit tests and our integration tests are supported. # Running `meson test` without any arguments will execute all tests, # but the syntax supports globbing to select only some tests. $ meson test 't-' # Execute single test interactively to allow for debugging. $ meson test 't0000-' --interactive --test-args=-ix ``` The build instructions have been successfully tested on the following systems, tests are passing: - Apple macOS 10.15. - FreeBSD 14.1. - NixOS 24.11. - OpenBSD 7.6. - Ubuntu 24.04. - Windows 10 with Cygwin. - Windows 10 with MinGW64, except for t9700, which is also broken with our Makefile. - Windows 10 with Visual Studio 2022 toolchain, using the Native Tools Command Prompt with `meson setup --vsenv`. Tests pass, except for t9700. - Windows 10 with Visual Studio 2022 solution, using the Native Tools Command Prompt with `meson setup --backend vs2022`. Tests pass, except for t9700. - Windows 10 with VS Code, using the Meson plug-in. It is expected that there will still be rough edges in the current version. If this patch lands the expectation is that it will coexist with our other build systems for a while. Like this, distributions can slowly migrate over to Meson and report any findings they have to us such that we can continue to iterate. A potential cutoff date for other build systems may be Git 3.0. Some notes: - The installed distribution is structured somewhat differently than how it used to be the case. All of our binaries are installed into `$libexec/git-core`, while all binaries part of `$bindir` are now symbolic links pointing to the former. This rule is consistent in itself and thus easier to reason about. - We do not install dashed binaries into `$libexec/git-core` anymore, so there won't e.g. be a symlink for git-add(1). These are not required by modern Git and there isn't really much of a use case for those anymore. By not installing those symlinks we thus start the deprecation of this layout. - We're targeting Meson 1.3.0, which has been released relatively recently November 2023. The only feature we use from that version is `fs.relative_to()`, which we could replace if necessary. If so, we could start to target Meson 1.0.0 and newer, released in December 2022. - The whole build instructions count around 3300 lines, half of which is listing all of our code and test files. Our Makefiles are around 5000 lines, autoconf adds another 1300 lines. CMake in comparison has only 1200 linescode, but it avoids listing individual files and does not wire up auto-configuration as extensively as the Meson instructions do. - We bundle a set of subproject wrappers for curl, expat, openssl, pcre2 and zlib. This allows developers to build Git without these dependencies preinstalled, and Meson will fetch and build them automatically. This is especially helpful on Windows. Helped-by: Eli Schwartz <eschwartz@gentoo.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-07 07:52:14 +09:00
Patrick Steinhardt	00ab97b1bc	Documentation: add comparison of build systems We're contemplating whether to eventually replace our build systems with a build system that is easier to use. Add a comparison of build systems to our technical documentation as a baseline for discussion. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-07 07:52:13 +09:00
Patrick Steinhardt	023c3370ac	Documentation: extract script to generate a list of mergetools We include the list of available mergetools into our manpages. Extract the script that performs this logic such that we can reuse it in other build systems. While at it, refactor the Makefile targets such that we don't create "mergetools-list.made" anymore. It shouldn't be necessary, as we can instead have other targets depend on "mergetools-{diff,merge}.txt" directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-07 07:52:13 +09:00
Patrick Steinhardt	628d49f6e5	Documentation: teach "cmd-list.perl" about out-of-tree builds The "cmd-list.perl" script generates a list of commands that can be included into our manpages. The script doesn't know about out-of-tree builds and instead writes resulting files into the source directory. Adapt it such that we can read data from the source directory and write data into the build directory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-07 07:52:12 +09:00
Patrick Steinhardt	9219325be7	Documentation: allow sourcing generated includes from separate dir Our documentation uses "include::" directives to include parts that are either reused across multiple documents or parts that we generate at build time. Unfortunately, top-level includes are only ever resolved relative to the base directory, which is typically the directory of the including document. Most importantly, it is not possible to have either asciidoc or asciidoctor search multiple directories. It follows that both kinds of includes must live in the same directory. This is of course a bummer for out-of-tree builds, because here the dynamically-built includes live in the build directory whereas the static includes live in the source directory. Introduce a `build_dir` attribute and prepend it to all of our includes for dynamically-built files. This attribute gets set to the build directory and thus converts the include path to an absolute path, which asciidoc and asciidoctor know how to resolve. Note that this change also requires us to update "build-docdep.perl", which tries to figure out included files such our Makefile can set up proper build-time dependencies. This script simply scans through the source files for any lines that match "^include::" and treats the remainder of the line as included file path. But given that those may now contain the "{build_dir}" variable we have to teach the script to replace that attribute with the actual build directory. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-12-07 07:52:12 +09:00

1 2 3 4 5 ...

16123 Commits