Files
git/Documentation/git-backfill.adoc
Elijah Newren a1ad4a0fca backfill: default to grabbing edge blobs too
Commit 302aff0922 (backfill: accept revision arguments, 2026-03-26) added
support for accepting revision arguments to backfill.  This allows users
to do things like

   git backfill --remotes ^v2.3.0

and then run many commands without triggering on-demand downloads of
blobs.  However, if they have topics based on v2.3.0, they will likely
still trigger on-demand downloads.  Consider, for example, the command

   git log -p v2.3.0..topic

This would still trigger on-demand blob loadings after the backfill
command above, because the commit(s) with A as a parent will need to
diff against the blobs in A.  In fact, multiple commands need blobs from
the lower boundary of the revision range:

   * git log -p A..B                # After backfill A..B
   * git replay --onto TARGET A..B  # After backfill TARGET^! A..B
   * git checkout A && git merge B  # After backfill A...B

Add an extra --[no-]include-edges flag to allow grabbing blobs from
edge commits.  Since the point of backfill is to prevent on-demand blob
loading and these are common commands, default to --include-edges.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-04-15 20:32:29 -07:00

92 lines
3.3 KiB
Plaintext

git-backfill(1)
===============
NAME
----
git-backfill - Download missing objects in a partial clone
SYNOPSIS
--------
[synopsis]
git backfill [--min-batch-size=<n>] [--[no-]sparse] [--[no-]include-edges] [<revision-range>]
DESCRIPTION
-----------
Blobless partial clones are created using `git clone --filter=blob:none`
and then configure the local repository such that the Git client avoids
downloading blob objects unless they are required for a local operation.
This initially means that the clone and later fetches download reachable
commits and trees but no blobs. Later operations that change the `HEAD`
pointer, such as `git checkout` or `git merge`, may need to download
missing blobs in order to complete their operation.
In the worst cases, commands that compute blob diffs, such as `git blame`,
become very slow as they download the missing blobs in single-blob
requests to satisfy the missing object as the Git command needs it. This
leads to multiple download requests and no ability for the Git server to
provide delta compression across those objects.
The `git backfill` command provides a way for the user to request that
Git downloads the missing blobs (with optional filters) such that the
missing blobs representing historical versions of files can be downloaded
in batches. The `backfill` command attempts to optimize the request by
grouping blobs that appear at the same path, hopefully leading to good
delta compression in the packfile sent by the server.
In this way, `git backfill` provides a mechanism to break a large clone
into smaller chunks. Starting with a blobless partial clone with `git
clone --filter=blob:none` and then running `git backfill` in the local
repository provides a way to download all reachable objects in several
smaller network calls than downloading the entire repository at clone
time.
By default, `git backfill` downloads all blobs reachable from the `HEAD`
commit. This set can be restricted or expanded using various options below.
THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR MAY CHANGE IN THE FUTURE.
OPTIONS
-------
`--min-batch-size=<n>`::
Specify a minimum size for a batch of missing objects to request
from the server. This size may be exceeded by the last set of
blobs seen at a given path. The default minimum batch size is
50,000.
`--sparse`::
`--no-sparse`::
Only download objects if they appear at a path that matches the
current sparse-checkout. If the sparse-checkout feature is enabled,
then `--sparse` is assumed and can be disabled with `--no-sparse`.
`--include-edges`::
`--no-include-edges`::
Include blobs from boundary commits in the backfill. Useful in
preparation for commands like `git log -p A..B` or `git replay
--onto TARGET A..B`, where A..B normally excludes A but you need
the blobs from A as well. `--include-edges` is the default.
`<revision-range>`::
Backfill only blobs reachable from commits in the specified
revision range. When no _<revision-range>_ is specified, it
defaults to `HEAD` (i.e. the whole history leading to the
current commit). For a complete list of ways to spell
_<revision-range>_, see the "Specifying Ranges" section of
linkgit:gitrevisions[7].
+
You may also use commit-limiting options understood by
linkgit:git-rev-list[1] such as `--first-parent`, `--since`, or pathspecs.
SEE ALSO
--------
linkgit:git-clone[1],
linkgit:git-rev-list[1]
GIT
---
Part of the linkgit:git[1] suite