git-for-windows/git - git - Gitea: Self-hosted GitHub

mirror of https://github.com/git-for-windows/git.git synced 2026-06-29 15:32:07 -05:00

Author	SHA1	Message	Date
Johannes Schindelin	7678ab52b1	Merge branch 'phase-out-reset-stdin' This topic branch re-adds the deprecated --stdin/-z options to `git reset`. Those patches were overridden by a different set of options in the upstream Git project before we could propose `--stdin`. We offered this in MinGit to applications that wanted a safer way to pass lots of pathspecs to Git, and these applications will need to be adjusted. Instead of `--stdin`, `--pathspec-from-file=-` should be used, and instead of `-z`, `--pathspec-file-nul`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:58:05 +00:00
Johannes Schindelin	117dd17b08	reset: reinstate support for the deprecated --stdin option The `--stdin` option was a well-established paradigm in other commands, therefore we implemented it in `git reset` for use by Visual Studio. Unfortunately, upstream Git decided that it is time to introduce `--pathspec-from-file` instead. To keep backwards-compatibility for some grace period, we therefore reinstate the `--stdin` option on top of the `--pathspec-from-file` option, but mark it firmly as deprecated. Helped-by: Victoria Dye <vdye@github.com> Helped-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:58:05 +00:00
Ben Boeckel	54cb38dd2f	clean: suggest using `core.longPaths` if paths are too long to remove On Windows, git repositories may have extra files which need cleaned (e.g., a build directory) that may be arbitrarily deep. Suggest using `core.longPaths` if such situations are encountered. Fixes: #2715 Signed-off-by: Ben Boeckel <mathstuf@gmail.com>	2026-06-26 08:58:04 +00:00
Johannes Schindelin	ffe582bf78	clean: make use of FSCache The `git clean` command needs to enumerate plenty of files and directories, and can therefore benefit from the FSCache. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:58:04 +00:00
Ben Peart	550881dc74	fscache: fscache takes an initial size Update enable_fscache() to take an optional initial size parameter which is used to initialize the hashmap so that it can avoid having to rehash as additional entries are added. Add a separate disable_fscache() macro to make the code clearer and easier to read. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:58:04 +00:00
Ben Peart	60e61c237f	status: disable and free fscache at the end of the status command At the end of the status command, disable and free the fscache so that we don't leak the memory and so that we can dump the fscache statistics. Signed-off-by: Ben Peart <benpeart@microsoft.com>	2026-06-26 08:58:04 +00:00
Takuto Ikuta	bca9b4e589	checkout.c: enable fscache for checkout again This is retry of #1419. I added flush_fscache macro to flush cached stats after disk writing with tests for regression reported in #1438 and #1442. git checkout checks each file path in sorted order, so cache flushing does not make performance worse unless we have large number of modified files in a directory containing many files. Using chromium repository, I tested `git checkout .` performance when I delete 10 files in different directories. With this patch: TotalSeconds: 4.307272 TotalSeconds: 4.4863595 TotalSeconds: 4.2975562 Avg: 4.36372923333333 Without this patch: TotalSeconds: 20.9705431 TotalSeconds: 22.4867685 TotalSeconds: 18.8968292 Avg: 20.7847136 I confirmed this patch passed all tests in t/ with core_fscache=1. Signed-off-by: Takuto Ikuta <tikuta@chromium.org>	2026-06-26 08:58:04 +00:00
Jeff Hostetler	73685993c3	add: use preload-index and fscache for performance Teach "add" to use preload-index and fscache features to improve performance on very large repositories. During an "add", a call is made to run_diff_files() which calls check_remove() for each index-entry. This calls lstat(). On Windows, the fscache code intercepts the lstat() calls and builds a private cache using the FindFirst/FindNext routines, which are much faster. Somewhat independent of this, is the preload-index code which distributes some of the start-up costs across multiple threads. We need to keep the call to read_cache() before parsing the pathspecs (and hence cannot use the pathspecs to limit any preload) because parse_pathspec() is using the index to determine whether a pathspec is, in fact, in a submodule. If we would not read the index first, parse_pathspec() would not error out on a path that is inside a submodule, and t7400-submodule-basic.sh would fail with not ok 47 - do not add files from a submodule We still want the nice preload performance boost, though, so we simply call read_cache_preload(&pathspecs) after parsing the pathspecs. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:58:04 +00:00
Karsten Blees	6cabec8b8c	mingw: add infrastructure for read-only file system level caches Add a macro to mark code sections that only read from the file system, along with a config option and documentation. This facilitates implementation of relatively simple file system level caches without the need to synchronize with the file system. Enable read-only sections for 'git status' and preload_index. Signed-off-by: Karsten Blees <blees@dcon.de>	2026-06-26 08:58:04 +00:00
Johannes Schindelin	f028e7203e	Continue improving support for 4GB+ packs/clones/objects (#6289 ) This PR contains a branch thicket on top of v2.55.0-rc1 (i.e. ready to go upstream) to continue the bulk of the `unsigned long` -> `size_t` transformation. Since all of these changes have no impact on the currently-working functionality for <4GB objects/packs/clones (modulo bugs, that is 😄), I would like to merge this before v2.55.0-rc2, still: The risk of introducing a regression is negligible, the chance for fixing the majority of problems with large clones is high.	2026-06-26 08:58:03 +00:00
Johannes Schindelin	442d369390	Merge branch 'dont-clean-junctions' This topic branch teaches `git clean` to respect NTFS junctions and Unix bind mounts: it will now stop at those boundaries. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:58:02 +00:00
Johannes Schindelin	3ad6222635	Merge pull request #1897 from piscisaureus/symlink-attr Specify symlink type in .gitattributes	2026-06-26 08:58:02 +00:00
Johannes Schindelin	f1ba0e8c15	credential-cache: handle ECONNREFUSED gracefully (#5329 ) I should probably add some tests for this.	2026-06-26 08:58:01 +00:00
Johannes Schindelin	e36cd7e38a	fast-import: drop the six size casts in the object-read paths Continue the size_t evacuation. fast-import's helper gfi_unpack_entry() and the five size-handling sites that feed off it (store_object()'s deltalen, load_tree(), parse_from_existing(), the inline gfi_unpack_entry() caller in parse_objectish(), cat_blob(), and dereference()) all carry size_t-shaped values from the odb / unpack_entry() APIs through cast_size_t_to_ulong() bridges into unsigned long locals. With the producers (odb_read_object(), odb_read_object_peeled(), unpack_entry()) and the consumers it feeds (the zlib avail_in field from a prior commit, encode_in_pack_object_header()'s uintmax_t parameter, parse_from_commit()'s widened size parameter) all size_t-ready, the bridges and casts go away in one pass. gfi_unpack_entry() now writes into the caller's size_t directly, and the six locals collapse to plain size_t declarations. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:55 +00:00
Johannes Schindelin	624ede0fb4	pack-objects: drop the last size shim in write_no_reuse_object() Continue the size_t evacuation that this series and the merged js/objects-larger-than-4gb-on-windows topic are advancing for >4 GiB objects on Windows: with the odb readers and the zlib helpers reached from do_compress() now widened end-to-end, the last cast_size_t_to_ulong() shim in this function can be removed, and do_compress() itself can carry the new size type through. Two cast_size_t_to_ulong() shims remain in this file; they feed the tree-walk API, which is still narrow and is a separate widening topic. write_no_reuse_object()'s return type and the hashfile API are still narrow but unchanged in observable behaviour: on 64-bit Linux ulong coincides with size_t, and on Windows these were the narrow fenceposts the prior topics deliberately left in place. Their widening is left to follow-ups touching the hashfile API and the write_object() caller chain. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:55 +00:00
Johannes Schindelin	4a47d0787c	pack-objects: drop cast_size_t_to_ulong shims in try_delta() Companion to the prior get_delta() cleanup, and the last try_delta() piece of the >4 GiB delta-path topic. Every consumer that the function's locals fed has now been widened: SIZE() / DELTA_SIZE() to size_t (prior topic), the mem_usage out-parameter and delta_cacheable() earlier in this series, and create_delta() / create_delta_index() in the immediately preceding commits. Widen the declaration of trg_size, src_size, sizediff, max_size and sz to size_t (delta_size joins them on the same line, removing the size_t delta_size line that the create_delta() widening commit added as a stop-gap), and drop the two sz_st bridge variables together with the surrounding cast_size_t_to_ulong() calls. The result is just "odb_read_object(&sz)" on both reads. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:55 +00:00
Johannes Schindelin	da17ddcf28	pack-objects: drop cast_size_t_to_ulong shims in get_delta() The two shims that `606c192380` (odb, packfile: use size_t for streaming object sizes, 2026-05-08) and the subsequent odb_read_object() widening introduced as scaffolding around get_delta()'s reads can now disappear: the previous commit widened diff_delta() to size_t, which was the last narrow consumer in this function. Widen size and base_size to size_t outright, drop the size_st / base_size_st bridging temporaries, and drop the two cast_size_t_to_ulong() calls. Net change is 4 lines smaller and one read-then-cast indirection gone from each odb read. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:55 +00:00
Johannes Schindelin	6b2643abce	Merge branch 'size-t/unpack-objects'	2026-06-26 08:57:54 +00:00
Johannes Schindelin	77f84ccac1	Merge branch 'size-t/repo'	2026-06-26 08:57:54 +00:00
Johannes Schindelin	9605858656	Merge branch 'size-t/fast-export'	2026-06-26 08:57:54 +00:00
Johannes Schindelin	67eb2fe4a0	Merge branch 'size-t/commit'	2026-06-26 08:57:54 +00:00
Johannes Schindelin	9a9bc5c853	Merge branch 'size-t/tree'	2026-06-26 08:57:54 +00:00
Johannes Schindelin	6cd3029a24	clean: remove mount points when possible Windows' equivalent to "bind mounts", NTFS junction points, can be unlinked without affecting the mount target. This is clearly what users expect to happen when they call `git clean -dfx` in a worktree that contains NTFS junction points: the junction should be removed, and the target directory of said junction should be left alone (unless it is inside the worktree). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	fa930f9e06	unpack-objects: widen the size-passing infrastructure to size_t Drop the last cast_size_t_to_ulong() in builtin/unpack-objects.c. With size_t-typed object sizes already coming in via odb_read_object() and the per-byte varint decode in unpack_one() (widened by `f2063855fb`), the rest of the file was the only thing left that still threaded sizes through unsigned long: struct obj_buffer.size and struct delta_info.size, get_data() and add_object_buffer(), add_delta_to_list(), resolve_delta(), resolve_against_held(), added_object(), write_object(), unpack_non_delta_entry(), unpack_delta_entry(), and stream_blob(). Widen all of them together. None of those types had a downstream narrow consumer once odb_write_object() and patch_delta() were widened earlier, so the change is mechanical: parameter and field types change, the base_size_st bridge in unpack_delta_entry() and its cast go away, and odb_read_object() now writes into base_size directly. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	e31bb25933	repo: drop the inflated-size cast in count_objects() Continue the size_t evacuation. count_objects() feeds the inflated size from odb_read_object_info_extended()'s size_t out-parameter into struct object_values (size_t) and check_largest() (size_t) through an unsigned long bridge with a cast_size_t_to_ulong() shim. The bridge was the only narrow link in the chain. Widen the local, point oi.sizep at it directly, and drop the cast. parse_object_buffer() still takes unsigned long, so a Windows narrowing remains at that one call; that is its own follow-up topic. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	bd956e9dd4	fast-export: drop the export_blob() size cast and widen anonymize_blob() Mirror of the preceding fast-import sweep. anonymize_blob() writes strbuf.len (size_t) into its out-parameter, and export_blob()'s non-anonymize branch reads odb_read_object()'s size_t out-parameter through a size_st + cast_size_t_to_ulong() bridge into an unsigned long local; both have been silent on Windows past 4 GiB. Widen the helper signature and the local, and drop the bridge. check_object_signature() and parse_object_buffer() still take unsigned long, so the silent narrowing on Windows just moves from the local assignment to those call sites; both are separate topics. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	ea9548a2de	commit: widen the commit-buffer API to size_t Continue the migration from `unsigned long` to `size_t`. The `size` attribute of `struct commit_buffer` is fed either from `odb_read_object()`'s return value (`size_t`, handled with `cast_size_t_to_ulong()`) or from `strbuf.len` in `fake_working_tree_commit()` (silently narrowed today). Widen the field and a couple of function signatures together, drop the shim in `repo_get_commit_buffer()`, and move the matching `unsigned long` locals at the in-tree callers in commit.c (three sites), builtin/replace.c, and builtin/stash.c (two sites). The remaining callers pass NULL or already pass a size_t-compatible variable. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	e535a93014	clean: do not traverse mount points It seems to be not exactly rare on Windows to install NTFS junction points (the equivalent of "bind mounts" on Linux/Unix) in worktrees, e.g. to map some development tools into a subdirectory. In such a scenario, it is pretty horrible if `git clean -dfx` traverses into the mapped directory and starts to "clean up". Let's just not do that. Let's make sure before we traverse into a directory that it is not a mount point (or junction). This addresses https://github.com/git-for-windows/git/issues/607 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	824c02f12f	pack-objects: drop the two tree-walk casts in the preferred-base path With init_tree_desc() widened in the prior commit, the size_t-returning odb_read_object_peeled() call in add_preferred_base() and odb_read_object() call in pbase_tree_get() can both flow straight through to init_tree_desc() and into the pbase_tree_cache. Widen pbase_tree_cache.tree_size and the two local size variables to size_t, drop the size_st bridges, and drop the two cast_size_t_to_ulong() shims. This was the last pair of cast_size_t_to_ulong() call sites in builtin/pack-objects.c, completing the >4 GiB-objects work in that file that this branch and its predecessors have been pursuing. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	eca45d90fb	diff: widen textconv_object() size out-param to size_t Continue the size_t evacuation. textconv_object() fills its out-parameter from fill_textconv()'s size_t return through an unsigned long*; widen the API to match, then take advantage of the new shape where callers can. cat-file's 'c' and batch-mode 'c' branches lose their size_ul bridge variables (one site becomes a direct call, the other collapses an if/else into a single negated condition that reads as "try textconv, fall back to a raw read"). blame.c likewise drops the file_size_st bridge in fill_origin_blob() and hoists final_buf_size_st to bracket both branches in setup_scoreboard(). The latter keeps a cast_size_t_to_ulong() shim because struct blame_scoreboard.final_buf_size is still unsigned long; that field is its own topic. log.c just widens its local from unsigned long to size_t. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	52da019eaa	Introduce helper to create symlinks that knows about index_state On Windows, symbolic links actually have a type depending on the target: it can be a file or a directory. In certain circumstances, this poses problems, e.g. when a symbolic link is supposed to point into a submodule that is not checked out, so there is no way for Git to auto-detect the type. To help with that, we will add support over the course of the next commits to specify that symlink type via the Git attributes. This requires an index_state, though, something that Git for Windows' `symlink()` replacement cannot know about because the function signature is defined by the POSIX standard and not ours to change. So let's introduce a helper function to create symbolic links that does know about the index_state. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	1d0e6f9746	packfile, git-zlib: widen use_pack() and zstream avail fields to size_t Bundling the two widenings: four call sites pass &stream.avail_in directly to use_pack(), and widening either type fencepost alone would force a bridge variable at each. Doing both together is the simpler end state and is the prerequisite for the do_compress() widening in the next commit, which is what lets write_no_reuse_object() lose its last cast_size_t_to_ulong() shim. The unsigned-long locals widened at the other use_pack() callers (avail / remaining / left) hold pack-window sizes bounded by core.packedGitWindowSize, so the change is type consistency rather than a new >4GB capability. git_zstream.avail_in / avail_out likewise reach zlib's uInt fields only after zlib_buf_cap()'s 1 GiB cap, so the wrapper already accepted size_t-shaped inputs in practice. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	2f8a320527	delta: widen create_delta() and diff_delta() to size_t Last stop in the delta-encoding API widening for >4 GiB blobs on Windows: with create_delta_index() done in the prior commit and create_delta()/diff_delta() finished here, every byte count that crosses delta.h is now size_t. The struct fields they store into have been size_t since the diff-delta struct widening. The API change must move with all callers in the same commit (the build only passes when every &delta_size matches the new size_t). Caller updates are kept minimal: builtin/pack-objects.c get_delta() and try_delta(): widen only the local delta_size variable; the surrounding unsigned-long locals and their cast_size_t_to_ulong() shims are out of scope here and will be cleaned up in their own commits. * builtin/fast-import.c, diff.c, t/helper/test-pack-deltas.c: keep the local unsigned-long delta size (each feeds a still- unsigned-long downstream consumer: zlib's avail_in, deflate_it(), the test helper's own do_compress()), and bridge via a temporary size_t plus cast_size_t_to_ulong(). The new casts are paid back in later topics that widen those consumers. * t/helper/test-delta.c: widen the local outright (no downstream consumer beyond the test's own out_size, which is already size_t). Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	f9a6df6aba	pack-objects: widen mem_usage and try_delta out-param to size_t The pair must move together because find_deltas() passes &mem_usage to try_delta(): widening either alone breaks the type match. mem_usage accumulates per-object byte counts already computed in size_t (SIZE() and sizeof_delta_index() reach here through free_unpacked(), now size_t), and was the last 32-bit-on-Windows narrowing point in the delta-window memory accounting chain. With this commit, that chain is internally size_t end-to-end except for sizeof_delta_index()'s still-narrow return, whose value is bounded by create_delta_index()'s entries cap. window_memory_limit (config-driven via git_config_ulong()) stays unsigned long: it is only compared against mem_usage and promotes. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	819d8a1cfd	pack-objects: widen free_unpacked() return to size_t free_unpacked() sums two byte counts: sizeof_delta_index() and SIZE(n->entry). The latter has been size_t since the prior topic "More work supporting objects larger than 4GB on Windows" widened SIZE() / oe_size() to size_t, so accumulating it into an unsigned long return was a silent Windows-only truncation on a packing run with many large objects. The sole caller (find_deltas()) holds its own mem_usage in an unsigned long for now and subtracts the return into it, so the new narrowing happens at that subtraction. find_deltas() and the matching try_delta() out-parameter are widened next. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Johannes Schindelin	848acf183c	pack-objects: widen delta-cache accounting to size_t These three are a single accounting tuple (the globals tracking cumulative cached-delta bytes, plus the helper that compares them against an incoming delta size) and are latently 32-bit on Windows where unsigned long != size_t: a pack with many large cached deltas could wrap silently. The widening is internally consistent on its own: the additions and subtractions against delta_cache_size already come from size_t sources (DELTA_SIZE() returns size_t), and delta_cacheable()'s sole caller in try_delta() still passes unsigned long, which promotes. Prerequisite for dropping try_delta()'s cast_size_t_to_ulong() shims, which becomes possible once create_delta() and diff_delta() are widened in a later commit. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:53 +00:00
Matthias Aßhauer	17f8832136	credential-cache: handle ECONNREFUSED gracefully In `245670c` (credential-cache: check for windows specific errors, 2021-09-14) we concluded that on Windows we would always encounter ENETDOWN where we would expect ECONNREFUSED on POSIX systems, when connecting to unix sockets. As reported in [1], we do encounter ECONNREFUSED on Windows if the socket file doesn't exist, but the containing directory does and ENETDOWN if neither exists. We should handle this case like we do on non-windows systems. [1] https://github.com/git-for-windows/git/pull/4762#issuecomment-2545498245 This fixes https://github.com/git-for-windows/git/issues/5314 Helped-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:52 +00:00
Johannes Schindelin	e223d7c2d7	survey: clearly note the experimental nature in the output While this command is definitely something we _want_, chances are that upstreaming this will require substantial changes. We still want to be able to experiment with this before that, to focus on what we need out of this command: To assist with diagnosing issues with large repositories, as well as to help monitoring the growth and the associated painpoints of such repositories. To that end, we are about to integrate this command into `microsoft/git`, to get the tool into the hands of users who need it most, with the idea to iterate in close collaboration between these users and the developers familar with Git's internals. However, we will definitely want to avoid letting anybody have the impression that this command, its exact inner workings, as well as its output format, are anywhere close to stable. To make that fact utterly clear (and thereby protect the freedom to iterate and innovate freely before upstreaming the command), let's mark its output as experimental in all-caps, as the first thing we do. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:52 +00:00
Derrick Stolee	39e3d7c1d0	survey: add --top=<N> option and config The 'git survey' builtin provides several detail tables, such as "top files by on-disk size". The size of these tables defaults to 10, currently. Allow the user to specify this number via a new --top=<N> option or the new survey.top config key. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:52 +00:00
Derrick Stolee	9f1a18302e	survey: add report of "largest" paths Since we are already walking our reachable objects using the path-walk API, let's now collect lists of the paths that contribute most to different metrics. Specifically, we care about * Number of versions. * Total size on disk. * Total inflated size (no delta or zlib compression). This information can be critical to discovering which parts of the repository are causing the most growth, especially on-disk size. Different packing strategies might help compress data more efficiently, but the toal inflated size is a representation of the raw size of all snapshots of those paths. Even when stored efficiently on disk, that size represents how much information must be processed to complete a command such as 'git blame'. The exact disk size seems to be not quite robust enough for testing, as could be seen by the `linux-musl-meson` job consistently failing, possibly because of zlib-ng deflates differently: t8100.4(git survey (default)) was failing with a symptom like this: TOTAL OBJECT SIZES BY TYPE =============================================== Object Type \| Count \| Disk Size \| Inflated Size ------------+-------+-----------+-------------- - Commits \| 10 \| 1523 \| 2153 + Commits \| 10 \| 1528 \| 2153 Trees \| 10 \| 495 \| 1706 Blobs \| 10 \| 191 \| 101 - Tags \| 4 \| 510 \| 528 + Tags \| 4 \| 547 \| 528 This means: the disk size is unlikely something we can verify robustly. Since zlib-ng seems to increase the disk size of the tags from 528 to 547, we cannot even assume that the disk size is always smaller than the inflated size. We will most likely want to either skip verifying the disk size altogether, or go for some kind of fuzzy matching, say, by replacing `s/ 1[45][0-9][0-9] / ~1.5k /` and `s/ [45][0-9][0-9] / ~½k /` or something like that. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:52 +00:00
Derrick Stolee	6c582555bb	survey: add ability to track prioritized lists In future changes, we will make use of these methods. The intention is to keep track of the top contributors according to some metric. We don't want to store all of the entries and do a sort at the end, so track a constant-size table and remove rows that get pushed out depending on the chosen sorting algorithm. Co-authored-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by; Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>	2026-06-26 08:57:52 +00:00
Derrick Stolee	1dfb8cd108	survey: show progress during object walk Signed-off-by: Derrick Stolee <stolee@gmail.com>	2026-06-26 08:57:52 +00:00
Derrick Stolee	e10bb93305	survey: summarize total sizes by object type Now that we have explored objects by count, we can expand that a bit more to summarize the data for the on-disk and inflated size of those objects. This information is helpful for diagnosing both why disk space (and perhaps clone or fetch times) is growing but also why certain operations are slow because the inflated size of the abstract objects that must be processed is so large. Note: zlib-ng is slightly more efficient even at those small sizes. Even between zlib versions, there are slight differences in compression. To accommodate for that in the tests, not the exact numbers but some rough approximations are validated (the test should validate `git survey`, after all, not zlib). Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2026-06-26 08:57:52 +00:00
Derrick Stolee	343e55fdf1	survey: add object count summary At the moment, nothing is obvious about the reason for the use of the path-walk API, but this will become more prevelant in future iterations. For now, use the path-walk API to sum up the counts of each kind of object. For example, this is the reachable object summary output for my local repo: REACHABLE OBJECT SUMMARY ======================== Object Type \| Count ------------+------- Tags \| 1343 Commits \| 179344 Trees \| 314350 Blobs \| 184030 Signed-off-by: Derrick Stolee <stolee@gmail.com>	2026-06-26 08:57:52 +00:00
Derrick Stolee	1180da077b	survey: start pretty printing data in table form When 'git survey' provides information to the user, this will be presented in one of two formats: plaintext and JSON. The JSON implementation will be delayed until the functionality is complete for the plaintext format. The most important parts of the plaintext format are headers specifying the different sections of the report and tables providing concreted data. Create a custom table data structure that allows specifying a list of strings for the row values. When printing the table, check each column for the maximum width so we can create a table of the correct size from the start. The table structure is designed to be flexible to the different kinds of output that will be implemented in future changes. Signed-off-by: Derrick Stolee <stolee@gmail.com>	2026-06-26 08:57:52 +00:00
Jeff Hostetler	364af39455	survey: add command line opts to select references By default we will scan all references in "refs/heads/", "refs/tags/" and "refs/remotes/". Add command line opts let the use ask for all refs or a subset of them and to include a detached HEAD. Signed-off-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>	2026-06-26 08:57:52 +00:00
Jeff Hostetler	532dd1d2a7	survey: stub in new experimental 'git-survey' command Start work on a new 'git survey' command to scan the repository for monorepo performance and scaling problems. The goal is to measure the various known "dimensions of scale" and serve as a foundation for adding additional measurements as we learn more about Git monorepo scaling problems. The initial goal is to complement the scanning and analysis performed by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool. It is hoped that by creating a builtin command, we may be able to take advantage of internal Git data structures and code that is not accessible from GO to gain further insight into potential scaling problems. Co-authored-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Jeff Hostetler <git@jeffhostetler.com> Signed-off-by: Derrick Stolee <stolee@gmail.com>	2026-06-26 08:57:52 +00:00
Junio C Hamano	511d8b6107	Merge branch 'jc/history-message-prep-fix' into seen Code clean-up with leakfix for a write file stream. * jc/history-message-prep-fix: history: streamline message preparation and plug file stream leak	2026-06-25 19:51:54 -07:00
Junio C Hamano	dc2a330582	Merge branch 'hn/branch-push-slip-advice' into seen "git push origin/main" and "git branch origin main" could both be an obvious typo, in which case offer the obvious typofix. * hn/branch-push-slip-advice: SQUASH??? use test_grep push: suggest <remote> <branch> for a slash slip branch: suggest <remote>/<branch> on upstream slip	2026-06-25 19:49:56 -07:00
Junio C Hamano	c4bdde67b7	Merge branch 'jt/receive-pack-use-odb-transactions' into seen git-receive-pack has been refactored to use ODB transaction interfaces instead of directly managing tmp_objdir for staging incoming objects, bringing it closer to being ODB backend agnostic. * jt/receive-pack-use-odb-transactions: builtin/receive-pack: stage incoming objects via ODB transactions odb/transaction: add transaction env interface odb/transaction: propagate commit errors odb/transaction: propagate begin errors object-file: propagate files transaction errors object-file: rename files transaction prepare function	2026-06-25 19:49:56 -07:00

1 2 3 4 5 ...

13912 Commits