mirror of
https://github.com/git-for-windows/git.git
synced 2026-06-17 07:21:10 -05:00
When we load a commit object from the commit graph (rather than reading
the object contents), we don't fill in its "maybe_tree" entry, but
rather wait to lazy-load it. This goes back to 7b8a21dba1 (commit-graph:
lazy-load trees for commits, 2018-04-06), and saves the work of
instantiating tree objects that nobody cares about.
But it creates a data dependency: now the commit struct depends on the
graph file to do that lazy load. This is a problem if we close the graph
file; now we have a commit struct that claims to be parsed but is
missing some of its data.
It's rare for this to be a problem in practice, because we don't tend to
close the graph files at all, and if we do we don't tend to look at
their commits afterward. But there is one case that is easy to trigger:
git-clone's --dissociate option will close the object database before
running the dissociate repack, and then afterwards still try to check
out the working tree. This will yield an error like:
fatal: unable to parse commit b29edc0babef41810f7b1c9ee1d74058f22e4080
warning: Clone succeeded, but checkout failed.
What happens is that we expect repo_get_commit_tree() to lazy-load the
tree, but commit_graph_position() returns COMMIT_NOT_FROM_GRAPH because
the position slab has gone away (and even if it hadn't, we don't have
the graph file itself available anymore).
Let's try harder to find the tree in repo_get_commit_tree() by actually
opening the commit object and parsing the tree line. This is extra work,
but no more than we'd have to go to if we hadn't done the initial graph
load in the first place.
It does mean that a corrupt commit (e.g., one that points to a non-tree
object for which we couldn't instantiate a struct) will repeatedly load
the object from disk, once for each call to repo_get_commit_tree(). But
such corruptions should be rare, and we don't tend to perform such calls
repeatedly (usually we'd abort the operation upon seeing corruption).
It also means we have to reimplement a bit of the commit parsing. We
can't just use parse_commit_buffer() here, because it expects an
unparsed struct and wants to load everything, including parent links.
But we don't know if the parent list has been munged during traversal,
so it's not safe for us to touch it. Fortunately, it's quite easy to
load just the tree, as it is always the first line of the commit object.
There is an alternative approach which I considered but rejected:
"complete" each graph-loaded commit struct when we close the graph file
by looking up and instantiating their trees at close time. This is the
most elegant solution in some sense, as it resolves the data dependency
at the moment it goes away. And it avoids ever opening the commit
objects at all, which can be more efficient.
But not always. The resolving effort scales with the number of
graph-loaded commits, even though we may only later access one or a few.
So the tradeoff depends on how many were loaded in total versus how many
will be later accessed.
And in most cases, we will not access any at all! Programs which close
the object database before exiting will then do a bunch of work for no
reason. This could be mitigated by requiring a separate function to
resolve the graph structs before closing the file. But now each close
call has to consider whether to call that resolving function. So we'd
fix this case in git-clone, but we don't know what other cases (if any)
are lurking.
Moreover, this strategy does nothing if we lose access to the graph file
unexpectedly (e.g., due to a system error). I'm not entirely sure this
is possible now (we mmap it, so I'd guess any error would turn into
SIGBUS anyway). But it feels like making the lazy-load more robust
(which this patch does) is the best way to handle a wide variety of
possible failure modes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
387 lines
10 KiB
Bash
Executable File
387 lines
10 KiB
Bash
Executable File
#!/bin/sh
|
|
#
|
|
# Copyright (C) 2006 Martin Waitz <tali@admingilde.org>
|
|
#
|
|
|
|
test_description='test clone --reference'
|
|
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
|
|
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
|
|
|
|
. ./test-lib.sh
|
|
|
|
base_dir=$(pwd)
|
|
|
|
U=$base_dir/UPLOAD_LOG
|
|
|
|
# create a commit in repo $1 with name $2
|
|
commit_in () {
|
|
(
|
|
cd "$1" &&
|
|
echo "$2" >"$2" &&
|
|
git add "$2" &&
|
|
git commit -m "$2"
|
|
)
|
|
}
|
|
|
|
# check that there are $2 loose objects in repo $1
|
|
test_objcount () {
|
|
echo "$2" >expect &&
|
|
git -C "$1" count-objects >actual.raw &&
|
|
cut -d' ' -f1 <actual.raw >actual &&
|
|
test_cmp expect actual
|
|
}
|
|
|
|
test_expect_success 'preparing first repository' '
|
|
test_create_repo A &&
|
|
commit_in A file1
|
|
'
|
|
|
|
test_expect_success 'preparing second repository' '
|
|
git clone A B &&
|
|
commit_in B file2 &&
|
|
git -C B repack -ad &&
|
|
git -C B prune
|
|
'
|
|
|
|
test_expect_success 'cloning with reference (-l -s)' '
|
|
git clone -l -s --reference B A C
|
|
'
|
|
|
|
test_expect_success 'existence of info/alternates' '
|
|
test_line_count = 2 C/.git/objects/info/alternates
|
|
'
|
|
|
|
test_expect_success 'pulling from reference' '
|
|
git -C C pull ../B main
|
|
'
|
|
|
|
test_expect_success 'that reference gets used' '
|
|
test_objcount C 0
|
|
'
|
|
|
|
test_expect_success 'cloning with reference (no -l -s)' '
|
|
GIT_TRACE_PACKET=$U.D git clone --reference B "file://$(pwd)/A" D
|
|
'
|
|
|
|
test_expect_success 'fetched no objects' '
|
|
test -s "$U.D" &&
|
|
! grep " want" "$U.D"
|
|
'
|
|
|
|
test_expect_success 'existence of info/alternates' '
|
|
test_line_count = 1 D/.git/objects/info/alternates
|
|
'
|
|
|
|
test_expect_success 'pulling from reference' '
|
|
git -C D pull ../B main
|
|
'
|
|
|
|
test_expect_success 'that reference gets used' '
|
|
test_objcount D 0
|
|
'
|
|
|
|
test_expect_success 'updating origin' '
|
|
commit_in A file3 &&
|
|
git -C A repack -ad &&
|
|
git -C A prune
|
|
'
|
|
|
|
test_expect_success 'pulling changes from origin' '
|
|
git -C C pull --no-rebase origin
|
|
'
|
|
|
|
# the 2 local objects are commit and tree from the merge
|
|
test_expect_success 'that alternate to origin gets used' '
|
|
test_objcount C 2
|
|
'
|
|
|
|
test_expect_success 'pulling changes from origin' '
|
|
git -C D pull --no-rebase origin
|
|
'
|
|
|
|
# the 5 local objects are expected; file3 blob, commit in A to add it
|
|
# and its tree, and 2 are our tree and the merge commit.
|
|
test_expect_success 'check objects expected to exist locally' '
|
|
test_objcount D 5
|
|
'
|
|
|
|
test_expect_success 'preparing alternate repository #1' '
|
|
test_create_repo F &&
|
|
commit_in F file1
|
|
'
|
|
|
|
test_expect_success 'cloning alternate repo #2 and adding changes to repo #1' '
|
|
git clone F G &&
|
|
commit_in F file2
|
|
'
|
|
|
|
test_expect_success 'cloning alternate repo #1, using #2 as reference' '
|
|
git clone --reference G F H
|
|
'
|
|
|
|
test_expect_success 'cloning with reference being subset of source (-l -s)' '
|
|
git clone -l -s --reference A B E
|
|
'
|
|
|
|
test_expect_success 'cloning with multiple references drops duplicates' '
|
|
git clone -s --reference B --reference A --reference B A dups &&
|
|
test_line_count = 2 dups/.git/objects/info/alternates
|
|
'
|
|
|
|
test_expect_success 'clone with reference from a tagged repository' '
|
|
(
|
|
cd A && git tag -a -m tagged foo
|
|
) &&
|
|
git clone --reference=A A I
|
|
'
|
|
|
|
test_expect_success 'prepare branched repository' '
|
|
git clone A J &&
|
|
(
|
|
cd J &&
|
|
git checkout -b other main^ &&
|
|
echo other >otherfile &&
|
|
git add otherfile &&
|
|
git commit -m other &&
|
|
git checkout main
|
|
)
|
|
'
|
|
|
|
test_expect_success 'fetch with incomplete alternates' '
|
|
git init K &&
|
|
echo "$base_dir/A/.git/objects" >K/.git/objects/info/alternates &&
|
|
(
|
|
cd K &&
|
|
git remote add J "file://$base_dir/J" &&
|
|
GIT_TRACE_PACKET=$U.K git fetch J
|
|
) &&
|
|
main_object=$(git -C A rev-parse --verify refs/heads/main) &&
|
|
test -s "$U.K" &&
|
|
! grep " want $main_object" "$U.K" &&
|
|
tag_object=$(git -C A rev-parse --verify refs/tags/foo) &&
|
|
! grep " want $tag_object" "$U.K"
|
|
'
|
|
|
|
test_expect_success 'clone using repo with gitfile as a reference' '
|
|
git clone --separate-git-dir=L A M &&
|
|
git clone --reference=M A N &&
|
|
echo "$base_dir/L/objects" >expected &&
|
|
test_cmp expected "$base_dir/N/.git/objects/info/alternates"
|
|
'
|
|
|
|
test_expect_success 'clone using repo pointed at by gitfile as reference' '
|
|
git clone --reference=M/.git A O &&
|
|
echo "$base_dir/L/objects" >expected &&
|
|
test_cmp expected "$base_dir/O/.git/objects/info/alternates"
|
|
'
|
|
|
|
test_expect_success 'clone and dissociate from reference' '
|
|
git init P &&
|
|
(
|
|
cd P && test_commit one
|
|
) &&
|
|
git clone P Q &&
|
|
(
|
|
cd Q && test_commit two
|
|
) &&
|
|
git clone --no-local --reference=P Q R &&
|
|
git clone --no-local --reference=P --dissociate Q S &&
|
|
# removing the reference P would corrupt R but not S
|
|
rm -fr P &&
|
|
test_must_fail git -C R fsck &&
|
|
git -C S fsck
|
|
'
|
|
test_expect_success 'clone, dissociate from partial reference and repack' '
|
|
rm -fr P Q R &&
|
|
git init P &&
|
|
(
|
|
cd P &&
|
|
test_commit one &&
|
|
git repack &&
|
|
test_commit two &&
|
|
git repack
|
|
) &&
|
|
git clone --bare P Q &&
|
|
(
|
|
cd P &&
|
|
git checkout -b second &&
|
|
test_commit three &&
|
|
git repack
|
|
) &&
|
|
git clone --bare --dissociate --reference=P Q R &&
|
|
ls R/objects/pack/*.pack >packs.txt &&
|
|
test_line_count = 1 packs.txt
|
|
'
|
|
|
|
test_expect_success 'clone, dissociate from alternates' '
|
|
rm -fr A B C &&
|
|
test_create_repo A &&
|
|
commit_in A file1 &&
|
|
git clone --reference=A A B &&
|
|
test_line_count = 1 B/.git/objects/info/alternates &&
|
|
git clone --local --dissociate B C &&
|
|
! test -f C/.git/objects/info/alternates &&
|
|
( cd C && git fsck )
|
|
'
|
|
|
|
test_expect_success 'setup repo with garbage in objects/*' '
|
|
git init S &&
|
|
(
|
|
cd S &&
|
|
test_commit A &&
|
|
|
|
cd .git/objects &&
|
|
>.some-hidden-file &&
|
|
>some-file &&
|
|
mkdir .some-hidden-dir &&
|
|
>.some-hidden-dir/some-file &&
|
|
>.some-hidden-dir/.some-dot-file &&
|
|
mkdir some-dir &&
|
|
>some-dir/some-file &&
|
|
>some-dir/.some-dot-file
|
|
)
|
|
'
|
|
|
|
test_expect_success 'clone a repo with garbage in objects/*' '
|
|
for option in --local --no-hardlinks --shared --dissociate
|
|
do
|
|
git clone $option S S$option || return 1 &&
|
|
git -C S$option fsck || return 1
|
|
done &&
|
|
find S-* -name "*some*" | sort >actual &&
|
|
cat >expected <<-EOF &&
|
|
S--dissociate/.git/objects/.some-hidden-dir
|
|
S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
|
|
S--dissociate/.git/objects/.some-hidden-dir/some-file
|
|
S--dissociate/.git/objects/.some-hidden-file
|
|
S--dissociate/.git/objects/some-dir
|
|
S--dissociate/.git/objects/some-dir/.some-dot-file
|
|
S--dissociate/.git/objects/some-dir/some-file
|
|
S--dissociate/.git/objects/some-file
|
|
S--local/.git/objects/.some-hidden-dir
|
|
S--local/.git/objects/.some-hidden-dir/.some-dot-file
|
|
S--local/.git/objects/.some-hidden-dir/some-file
|
|
S--local/.git/objects/.some-hidden-file
|
|
S--local/.git/objects/some-dir
|
|
S--local/.git/objects/some-dir/.some-dot-file
|
|
S--local/.git/objects/some-dir/some-file
|
|
S--local/.git/objects/some-file
|
|
S--no-hardlinks/.git/objects/.some-hidden-dir
|
|
S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
|
|
S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
|
|
S--no-hardlinks/.git/objects/.some-hidden-file
|
|
S--no-hardlinks/.git/objects/some-dir
|
|
S--no-hardlinks/.git/objects/some-dir/.some-dot-file
|
|
S--no-hardlinks/.git/objects/some-dir/some-file
|
|
S--no-hardlinks/.git/objects/some-file
|
|
EOF
|
|
test_cmp expected actual
|
|
'
|
|
|
|
test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
|
|
git init T &&
|
|
(
|
|
cd T &&
|
|
git config gc.auto 0 &&
|
|
test_commit A &&
|
|
git gc &&
|
|
test_commit B &&
|
|
|
|
cd .git/objects &&
|
|
mv pack packs &&
|
|
ln -s packs pack &&
|
|
find ?? -type d >loose-dirs &&
|
|
last_loose=$(tail -n 1 loose-dirs) &&
|
|
mv $last_loose a-loose-dir &&
|
|
ln -s a-loose-dir $last_loose &&
|
|
first_loose=$(head -n 1 loose-dirs) &&
|
|
rm -f loose-dirs &&
|
|
|
|
cd $first_loose &&
|
|
obj=$(ls *) &&
|
|
mv $obj ../an-object &&
|
|
ln -s ../an-object $obj &&
|
|
|
|
cd ../ &&
|
|
echo unknown_content >unknown_file
|
|
) &&
|
|
git -C T fsck &&
|
|
git -C T rev-list --all --objects >T.objects
|
|
'
|
|
|
|
|
|
test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
|
|
# None of these options work when cloning locally, since T has
|
|
# symlinks in its `$GIT_DIR/objects` directory
|
|
for option in --local --no-hardlinks --dissociate
|
|
do
|
|
test_must_fail git clone $option T T$option 2>err || return 1 &&
|
|
test_grep "symlink.*exists" err || return 1
|
|
done &&
|
|
|
|
# But `--shared` clones should still work, even when specifying
|
|
# a local path *and* that repository has symlinks present in its
|
|
# `$GIT_DIR/objects` directory.
|
|
git clone --shared T T--shared &&
|
|
git -C T--shared fsck &&
|
|
git -C T--shared rev-list --all --objects >T--shared.objects &&
|
|
test_cmp T.objects T--shared.objects &&
|
|
(
|
|
cd T--shared/.git/objects &&
|
|
find . -type f | sort >../../../T--shared.objects-files.raw &&
|
|
find . -type l | sort >../../../T--shared.objects-symlinks.raw
|
|
) &&
|
|
|
|
for raw in $(ls T*.raw)
|
|
do
|
|
sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
|
|
-e "/multi-pack-index/d" -e "/rev/d" <$raw >$raw.de-sha-1 &&
|
|
sort $raw.de-sha-1 >$raw.de-sha || return 1
|
|
done &&
|
|
|
|
echo ./info/alternates >expected-files &&
|
|
test_cmp expected-files T--shared.objects-files.raw &&
|
|
test_must_be_empty T--shared.objects-symlinks.raw
|
|
'
|
|
|
|
test_expect_success SYMLINKS 'clone repo with symlinked objects directory' '
|
|
test_when_finished "rm -fr sensitive malicious" &&
|
|
|
|
mkdir -p sensitive &&
|
|
echo "secret" >sensitive/file &&
|
|
|
|
git init malicious &&
|
|
rm -fr malicious/.git/objects &&
|
|
ln -s "$(pwd)/sensitive" ./malicious/.git/objects &&
|
|
|
|
test_must_fail git clone --local malicious clone 2>err &&
|
|
|
|
test_path_is_missing clone &&
|
|
grep "is a symlink, refusing to clone with --local" err
|
|
'
|
|
|
|
test_expect_success 'dissociate from repo with commit graph' '
|
|
git init orig &&
|
|
# We are trying to make sure the dissociated repo can
|
|
# find the tree of the tip commit, so the test could still
|
|
# serve its purpose with an empty tree. But having actual
|
|
# content future-proofs us against any kind of internal
|
|
# empty-tree optimizations.
|
|
echo content >orig/file &&
|
|
git -C orig add . &&
|
|
git -C orig commit -m foo &&
|
|
|
|
# We will use graph.git as our "local" source to dissociate
|
|
# from.
|
|
git clone --bare orig graph.git &&
|
|
git -C graph.git commit-graph write --reachable &&
|
|
|
|
# And then finally clone orig, using graph.git to get our objects. This
|
|
# must be non-bare so that we perform the checkout step, which will
|
|
# need to access the tree of HEAD, which we will have originally loaded
|
|
# via the commit graph.
|
|
git clone --no-local --reference graph.git --dissociate orig clone
|
|
'
|
|
|
|
test_done
|