p5313: add size comparison test

As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.

The recently-added --full-name-hash option swaps the default name-hash
algorithm with one that attempts to uniformly distribute the hashes
based on the full path name instead of the last 16 characters.

This has a dramatic effect on full repacks for repositories with many
versions of most paths. It can have a negative impact on cases such as
pushing a single change.

This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.

[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875

Checked out at the parent of [2], I see the following statistics:

Test                                           this tree
------------------------------------------------------------------
5313.2: thin pack                              0.02(0.01+0.01)
5313.3: thin pack size                                    1.1K
5313.4: thin pack with --full-name-hash        0.02(0.01+0.00)
5313.5: thin pack size with --full-name-hash              3.0K
5313.6: big pack                               1.65(3.35+0.24)
5313.7: big pack size                                    58.0M
5313.8: big pack with --full-name-hash         1.53(2.52+0.18)
5313.9: big pack size with --full-name-hash              57.6M
5313.10: repack                                176.52(706.60+3.53)
5313.11: repack size                                    446.7K
5313.12: repack with --full-name-hash          37.47(134.18+3.06)
5313.13: repack size with --full-name-hash              183.1K

Note that this demonstrates a 3x size _increase_ in the case that
simulates a small "git push". The size change is neutral on the case of
pushing the difference between HEAD and HEAD~1000.

However, the full repack case is both faster and more efficient.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
This commit is contained in:
Derrick Stolee
2024-08-28 12:07:42 -04:00
committed by Johannes Schindelin
parent 9b1f343258
commit fe91a8a193

73
t/perf/p5313-pack-objects.sh Executable file
View File

@@ -0,0 +1,73 @@
#!/bin/sh
test_description='Tests pack performance using bitmaps'
. ./perf-lib.sh
GIT_TEST_PASSING_SANITIZE_LEAK=0
export GIT_TEST_PASSING_SANITIZE_LEAK
test_perf_large_repo
test_expect_success 'create rev input' '
cat >in-thin <<-EOF &&
$(git rev-parse HEAD)
^$(git rev-parse HEAD~1)
EOF
cat >in-big <<-EOF
$(git rev-parse HEAD)
^$(git rev-parse HEAD~1000)
EOF
'
test_perf 'thin pack' '
git pack-objects --thin --stdout --revs --sparse <in-thin >out
'
test_size 'thin pack size' '
test_file_size out
'
test_perf 'thin pack with --full-name-hash' '
git pack-objects --thin --stdout --revs --sparse --full-name-hash <in-thin >out
'
test_size 'thin pack size with --full-name-hash' '
test_file_size out
'
test_perf 'big pack' '
git pack-objects --stdout --revs --sparse <in-big >out
'
test_size 'big pack size' '
test_file_size out
'
test_perf 'big pack with --full-name-hash' '
git pack-objects --stdout --revs --sparse --full-name-hash <in-big >out
'
test_size 'big pack size with --full-name-hash' '
test_file_size out
'
test_perf 'repack' '
git repack -adf
'
test_size 'repack size' '
pack=$(ls .git/objects/pack/pack-*.pack) &&
test_file_size "$pack"
'
test_perf 'repack with --full-name-hash' '
git repack -adf --full-name-hash
'
test_size 'repack size with --full-name-hash' '
pack=$(ls .git/objects/pack/pack-*.pack) &&
test_file_size "$pack"
'
test_done