mirror of
https://github.com/git-for-windows/git.git
synced 2026-05-01 02:53:51 -05:00
Since 'git pack-objects' supports a --path-walk option, allow passing it through in 'git repack'. This presents interesting testing opportunities for comparing the different repacking strategies against each other. Add the --path-walk option to the performance tests in p5313. For the microsoft/fluentui repo [1] checked out at a specific commit [2], the --path-walk tests in p5313 look like this: Test this tree ------------------------------------------------------------------------- 5313.18: thin pack with --path-walk 0.08(0.06+0.02) 5313.19: thin pack size with --path-walk 18.4K 5313.20: big pack with --path-walk 2.10(7.80+0.26) 5313.21: big pack size with --path-walk 19.8M 5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17) 5313.23: shallow pack size with --path-walk 33.6M 5313.24: repack with --path-walk 81.29(96.08+0.71) 5313.25: repack size with --path-walk 142.5M [1] https://github.com/microsoft/fluentui [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 Along with the earlier tests in p5313, I'll instead reformat the comparison as follows: Repack Method Pack Size Time --------------------------------------- Hash v1 439.4M 87.24s Hash v2 161.7M 21.51s Path Walk 142.5M 81.29s There are a few things to notice here: 1. The benefits of --name-hash-version=2 over --name-hash-version=1 are significant, but --path-walk still compresses better than that option. 2. The --path-walk command is still using --name-hash-version=1 for the second pass of delta computation, using the increased name hash collisions as a potential method for opportunistic compression on top of the path-focused compression. 3. The --path-walk algorithm is currently sequential and does not use multiple threads for delta compression. Threading will be implemented in a future change so the computation time will improve to better compete in this metric. There are small benefits in size for my copy of the Git repository: Repack Method Pack Size Time --------------------------------------- Hash v1 248.8M 30.44s Hash v2 249.0M 30.15s Path Walk 213.2M 142.50s As well as in the nodejs/node repository [3]: Repack Method Pack Size Time --------------------------------------- Hash v1 739.9M 71.18s Hash v2 764.6M 67.82s Path Walk 698.1M 208.10s [3] https://github.com/nodejs/node This benefit also repeats in my copy of the Linux kernel repository: Repack Method Pack Size Time --------------------------------------- Hash v1 2.5G 554.41s Hash v2 2.5G 549.62s Path Walk 2.2G 1562.36s It is important to see that even when the repository shape does not have many name-hash collisions, there is a slight space boost to be found using this method. As this repacking strategy was released in Git for Windows 2.47.0, some users have reported cases where the --path-walk compression is slightly worse than the --name-hash-version=2 option. In those cases, it may be beneficial to combine the two options. However, there has not been a released version of Git that has both options and I don't have access to these repos for testing. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
290 lines
11 KiB
Plaintext
290 lines
11 KiB
Plaintext
git-repack(1)
|
|
=============
|
|
|
|
NAME
|
|
----
|
|
git-repack - Pack unpacked objects in a repository
|
|
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
|
|
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
|
|
[--write-midx] [--name-hash-version=<n>] [--path-walk]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
|
|
This command is used to combine all objects that do not currently
|
|
reside in a "pack", into a pack. It can also be used to re-organize
|
|
existing packs into a single, more efficient pack.
|
|
|
|
A pack is a collection of objects, individually compressed, with
|
|
delta compression applied, stored in a single file, with an
|
|
associated index file.
|
|
|
|
Packs are used to reduce the load on mirror systems, backup
|
|
engines, disk storage, etc.
|
|
|
|
OPTIONS
|
|
-------
|
|
|
|
-a::
|
|
Instead of incrementally packing the unpacked objects,
|
|
pack everything referenced into a single pack.
|
|
Especially useful when packing a repository that is used
|
|
for private development. Use
|
|
with `-d`. This will clean up the objects that `git prune`
|
|
leaves behind, but `git fsck --full --dangling` shows as
|
|
dangling.
|
|
+
|
|
Note that users fetching over dumb protocols will have to fetch the
|
|
whole new pack in order to get any contained object, no matter how many
|
|
other objects in that pack they already have locally.
|
|
+
|
|
Promisor packfiles are repacked separately: if there are packfiles that
|
|
have an associated ".promisor" file, these packfiles will be repacked
|
|
into another separate pack, and an empty ".promisor" file corresponding
|
|
to the new separate pack will be written.
|
|
|
|
-A::
|
|
Same as `-a`, unless `-d` is used. Then any unreachable
|
|
objects in a previous pack become loose, unpacked objects,
|
|
instead of being left in the old pack. Unreachable objects
|
|
are never intentionally added to a pack, even when repacking.
|
|
This option prevents unreachable objects from being immediately
|
|
deleted by way of being left in the old pack and then
|
|
removed. Instead, the loose unreachable objects
|
|
will be pruned according to normal expiry rules
|
|
with the next 'git gc' invocation. See linkgit:git-gc[1].
|
|
|
|
-d::
|
|
After packing, if the newly created packs make some
|
|
existing packs redundant, remove the redundant packs.
|
|
Also run 'git prune-packed' to remove redundant
|
|
loose object files.
|
|
|
|
--cruft::
|
|
Same as `-a`, unless `-d` is used. Then any unreachable objects
|
|
are packed into a separate cruft pack. Unreachable objects can
|
|
be pruned using the normal expiry rules with the next `git gc`
|
|
invocation (see linkgit:git-gc[1]). Incompatible with `-k`.
|
|
|
|
--cruft-expiration=<approxidate>::
|
|
Expire unreachable objects older than `<approxidate>`
|
|
immediately instead of waiting for the next `git gc` invocation.
|
|
Only useful with `--cruft -d`.
|
|
|
|
--max-cruft-size=<n>::
|
|
Repack cruft objects into packs as large as `<n>` bytes before
|
|
creating new packs. As long as there are enough cruft packs
|
|
smaller than `<n>`, repacking will cause a new cruft pack to
|
|
be created containing objects from any combined cruft packs,
|
|
along with any new unreachable objects. Cruft packs larger than
|
|
`<n>` will not be modified. When the new cruft pack is larger
|
|
than `<n>` bytes, it will be split into multiple packs, all of
|
|
which are guaranteed to be at most `<n>` bytes in size. Only
|
|
useful with `--cruft -d`.
|
|
|
|
--expire-to=<dir>::
|
|
Write a cruft pack containing pruned objects (if any) to the
|
|
directory `<dir>`. This option is useful for keeping a copy of
|
|
any pruned objects in a separate directory as a backup. Only
|
|
useful with `--cruft -d`.
|
|
|
|
-l::
|
|
Pass the `--local` option to 'git pack-objects'. See
|
|
linkgit:git-pack-objects[1].
|
|
|
|
-f::
|
|
Pass the `--no-reuse-delta` option to `git-pack-objects`, see
|
|
linkgit:git-pack-objects[1].
|
|
|
|
-F::
|
|
Pass the `--no-reuse-object` option to `git-pack-objects`, see
|
|
linkgit:git-pack-objects[1].
|
|
|
|
-q::
|
|
--quiet::
|
|
Show no progress over the standard error stream and pass the `-q`
|
|
option to 'git pack-objects'. See linkgit:git-pack-objects[1].
|
|
|
|
-n::
|
|
Do not update the server information with
|
|
'git update-server-info'. This option skips
|
|
updating local catalog files needed to publish
|
|
this repository (or a direct copy of it)
|
|
over HTTP or FTP. See linkgit:git-update-server-info[1].
|
|
|
|
--window=<n>::
|
|
--depth=<n>::
|
|
These two options affect how the objects contained in the pack are
|
|
stored using delta compression. The objects are first internally
|
|
sorted by type, size and optionally names and compared against the
|
|
other objects within `--window` to see if using delta compression saves
|
|
space. `--depth` limits the maximum delta depth; making it too deep
|
|
affects the performance on the unpacker side, because delta data needs
|
|
to be applied that many times to get to the necessary object.
|
|
+
|
|
The default value for --window is 10 and --depth is 50. The maximum
|
|
depth is 4095.
|
|
|
|
--threads=<n>::
|
|
This option is passed through to `git pack-objects`.
|
|
|
|
--window-memory=<n>::
|
|
This option provides an additional limit on top of `--window`;
|
|
the window size will dynamically scale down so as to not take
|
|
up more than '<n>' bytes in memory. This is useful in
|
|
repositories with a mix of large and small objects to not run
|
|
out of memory with a large window, but still be able to take
|
|
advantage of the large window for the smaller objects. The
|
|
size can be suffixed with "k", "m", or "g".
|
|
`--window-memory=0` makes memory usage unlimited. The default
|
|
is taken from the `pack.windowMemory` configuration variable.
|
|
Note that the actual memory usage will be the limit multiplied
|
|
by the number of threads used by linkgit:git-pack-objects[1].
|
|
|
|
--max-pack-size=<n>::
|
|
Maximum size of each output pack file. The size can be suffixed with
|
|
"k", "m", or "g". The minimum size allowed is limited to 1 MiB.
|
|
If specified, multiple packfiles may be created, which also
|
|
prevents the creation of a bitmap index.
|
|
The default is unlimited, unless the config variable
|
|
`pack.packSizeLimit` is set. Note that this option may result in
|
|
a larger and slower repository; see the discussion in
|
|
`pack.packSizeLimit`.
|
|
|
|
--filter=<filter-spec>::
|
|
Remove objects matching the filter specification from the
|
|
resulting packfile and put them into a separate packfile. Note
|
|
that objects used in the working directory are not filtered
|
|
out. So for the split to fully work, it's best to perform it
|
|
in a bare repo and to use the `-a` and `-d` options along with
|
|
this option. Also `--no-write-bitmap-index` (or the
|
|
`repack.writebitmaps` config option set to `false`) should be
|
|
used otherwise writing bitmap index will fail, as it supposes
|
|
a single packfile containing all the objects. See
|
|
linkgit:git-rev-list[1] for valid `<filter-spec>` forms.
|
|
|
|
--filter-to=<dir>::
|
|
Write the pack containing filtered out objects to the
|
|
directory `<dir>`. Only useful with `--filter`. This can be
|
|
used for putting the pack on a separate object directory that
|
|
is accessed through the Git alternates mechanism. **WARNING:**
|
|
If the packfile containing the filtered out objects is not
|
|
accessible, the repo can become corrupt as it might not be
|
|
possible to access the objects in that packfile. See the
|
|
`objects` and `objects/info/alternates` sections of
|
|
linkgit:gitrepository-layout[5].
|
|
|
|
-b::
|
|
--write-bitmap-index::
|
|
Write a reachability bitmap index as part of the repack. This
|
|
only makes sense when used with `-a`, `-A` or `-m`, as the bitmaps
|
|
must be able to refer to all reachable objects. This option
|
|
overrides the setting of `repack.writeBitmaps`. This option
|
|
has no effect if multiple packfiles are created, unless writing a
|
|
MIDX (in which case a multi-pack bitmap is created).
|
|
|
|
--pack-kept-objects::
|
|
Include objects in `.keep` files when repacking. Note that we
|
|
still do not delete `.keep` packs after `pack-objects` finishes.
|
|
This means that we may duplicate objects, but this makes the
|
|
option safe to use when there are concurrent pushes or fetches.
|
|
This option is generally only useful if you are writing bitmaps
|
|
with `-b` or `repack.writeBitmaps`, as it ensures that the
|
|
bitmapped packfile has the necessary objects.
|
|
|
|
--keep-pack=<pack-name>::
|
|
Exclude the given pack from repacking. This is the equivalent
|
|
of having `.keep` file on the pack. `<pack-name>` is the
|
|
pack file name without leading directory (e.g. `pack-123.pack`).
|
|
The option can be specified multiple times to keep multiple
|
|
packs.
|
|
|
|
--unpack-unreachable=<when>::
|
|
When loosening unreachable objects, do not bother loosening any
|
|
objects older than `<when>`. This can be used to optimize out
|
|
the write of any objects that would be immediately pruned by
|
|
a follow-up `git prune`.
|
|
|
|
-k::
|
|
--keep-unreachable::
|
|
When used with `-ad`, any unreachable objects from existing
|
|
packs will be appended to the end of the packfile instead of
|
|
being removed. In addition, any unreachable loose objects will
|
|
be packed (and their loose counterparts removed).
|
|
|
|
-i::
|
|
--delta-islands::
|
|
Pass the `--delta-islands` option to `git-pack-objects`, see
|
|
linkgit:git-pack-objects[1].
|
|
|
|
-g<factor>::
|
|
--geometric=<factor>::
|
|
Arrange resulting pack structure so that each successive pack
|
|
contains at least `<factor>` times the number of objects as the
|
|
next-largest pack.
|
|
+
|
|
`git repack` ensures this by determining a "cut" of packfiles that need
|
|
to be repacked into one in order to ensure a geometric progression. It
|
|
picks the smallest set of packfiles such that as many of the larger
|
|
packfiles (by count of objects contained in that pack) may be left
|
|
intact.
|
|
+
|
|
Unlike other repack modes, the set of objects to pack is determined
|
|
uniquely by the set of packs being "rolled-up"; in other words, the
|
|
packs determined to need to be combined in order to restore a geometric
|
|
progression.
|
|
+
|
|
Loose objects are implicitly included in this "roll-up", without respect to
|
|
their reachability. This is subject to change in the future.
|
|
+
|
|
When writing a multi-pack bitmap, `git repack` selects the largest resulting
|
|
pack as the preferred pack for object selection by the MIDX (see
|
|
linkgit:git-multi-pack-index[1]).
|
|
|
|
-m::
|
|
--write-midx::
|
|
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
|
|
containing the non-redundant packs.
|
|
|
|
--name-hash-version=<n>::
|
|
Provide this argument to the underlying `git pack-objects` process.
|
|
See linkgit:git-pack-objects[1] for full details.
|
|
|
|
--path-walk::
|
|
Pass the `--path-walk` option to the underlying `git pack-objects`
|
|
process. See linkgit:git-pack-objects[1] for full details.
|
|
|
|
CONFIGURATION
|
|
-------------
|
|
|
|
Various configuration variables affect packing, see
|
|
linkgit:git-config[1] (search for "pack" and "delta").
|
|
|
|
By default, the command passes `--delta-base-offset` option to
|
|
'git pack-objects'; this typically results in slightly smaller packs,
|
|
but the generated packs are incompatible with versions of Git older than
|
|
version 1.4.4. If you need to share your repository with such ancient Git
|
|
versions, either directly or via the dumb http protocol, then you
|
|
need to set the configuration variable `repack.UseDeltaBaseOffset` to
|
|
"false" and repack. Access from old Git versions over the native protocol
|
|
is unaffected by this option as the conversion is performed on the fly
|
|
as needed in that case.
|
|
|
|
Delta compression is not used on objects larger than the
|
|
`core.bigFileThreshold` configuration variable and on files with the
|
|
attribute `delta` set to false.
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkgit:git-pack-objects[1]
|
|
linkgit:git-prune-packed[1]
|
|
|
|
GIT
|
|
---
|
|
Part of the linkgit:git[1] suite
|