From 6b5c2b8b7be7df2f861e72fdffc3a2ed1d0c5adf Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 5 Sep 2024 10:04:51 -0400 Subject: [PATCH] pack-objects: add --path-walk option In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. RFC TODO: It is important to note that this option is inherently incompatible with using a bitmap index. This walk probably also does not work with other advanced features, such as delta islands. Getting ahead of myself, this option compares well with --full-name-hash when the packfile is large enough, but also performs at least as well as the default in all cases that I've seen. RFC TODO: this should probably be recording the batch locations to another list so they could be processed in a second phase using threads. RFC TODO: list some examples of how this outperforms previous pack-objects strategies. (This is coming in later commits that include performance test changes.) Signed-off-by: Derrick Stolee --- Documentation/git-pack-objects.txt | 12 +- Documentation/technical/api-path-walk.txt | 3 +- builtin/pack-objects.c | 147 ++++++++++++++++++++-- t/t5300-pack-object.sh | 17 +++ 4 files changed, 168 insertions(+), 11 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index 93861d9f85..1a3023e206 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -16,7 +16,7 @@ SYNOPSIS [--cruft] [--cruft-expiration=