This topic branch re-adds the deprecated --stdin/-z options to `git
reset`. Those patches were overridden by a different set of options in
the upstream Git project before we could propose `--stdin`.
We offered this in MinGit to applications that wanted a safer way to
pass lots of pathspecs to Git, and these applications will need to be
adjusted.
Instead of `--stdin`, `--pathspec-from-file=-` should be used, and
instead of `-z`, `--pathspec-file-nul`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The `--stdin` option was a well-established paradigm in other commands,
therefore we implemented it in `git reset` for use by Visual Studio.
Unfortunately, upstream Git decided that it is time to introduce
`--pathspec-from-file` instead.
To keep backwards-compatibility for some grace period, we therefore
reinstate the `--stdin` option on top of the `--pathspec-from-file`
option, but mark it firmly as deprecated.
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, symbolic links actually have a type depending on the target:
it can be a file or a directory.
In certain circumstances, this poses problems, e.g. when a symbolic link
is supposed to point into a submodule that is not checked out, so there
is no way for Git to auto-detect the type.
To help with that, we will add support over the course of the next
commits to specify that symlink type via the Git attributes. This
requires an index_state, though, something that Git for Windows'
`symlink()` replacement cannot know about because the function signature
is defined by the POSIX standard and not ours to change.
So let's introduce a helper function to create symbolic links that
*does* know about the index_state.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, git repositories may have extra files which need cleaned
(e.g., a build directory) that may be arbitrarily deep. Suggest using
`core.longPaths` if such situations are encountered.
Fixes: #2715
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
The `git clean` command needs to enumerate plenty of files and
directories, and can therefore benefit from the FSCache.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Update enable_fscache() to take an optional initial size parameter which is
used to initialize the hashmap so that it can avoid having to rehash as
additional entries are added.
Add a separate disable_fscache() macro to make the code clearer and easier
to read.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
At the end of the status command, disable and free the fscache so that we
don't leak the memory and so that we can dump the fscache statistics.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
This is retry of #1419.
I added flush_fscache macro to flush cached stats after disk writing
with tests for regression reported in #1438 and #1442.
git checkout checks each file path in sorted order, so cache flushing does not
make performance worse unless we have large number of modified files in
a directory containing many files.
Using chromium repository, I tested `git checkout .` performance when I
delete 10 files in different directories.
With this patch:
TotalSeconds: 4.307272
TotalSeconds: 4.4863595
TotalSeconds: 4.2975562
Avg: 4.36372923333333
Without this patch:
TotalSeconds: 20.9705431
TotalSeconds: 22.4867685
TotalSeconds: 18.8968292
Avg: 20.7847136
I confirmed this patch passed all tests in t/ with core_fscache=1.
Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Teach "add" to use preload-index and fscache features
to improve performance on very large repositories.
During an "add", a call is made to run_diff_files()
which calls check_remove() for each index-entry. This
calls lstat(). On Windows, the fscache code intercepts
the lstat() calls and builds a private cache using the
FindFirst/FindNext routines, which are much faster.
Somewhat independent of this, is the preload-index code
which distributes some of the start-up costs across
multiple threads.
We need to keep the call to read_cache() before parsing the
pathspecs (and hence cannot use the pathspecs to limit any preload)
because parse_pathspec() is using the index to determine whether a
pathspec is, in fact, in a submodule. If we would not read the index
first, parse_pathspec() would not error out on a path that is inside
a submodule, and t7400-submodule-basic.sh would fail with
not ok 47 - do not add files from a submodule
We still want the nice preload performance boost, though, so we simply
call read_cache_preload(&pathspecs) after parsing the pathspecs.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Add a macro to mark code sections that only read from the file system,
along with a config option and documentation.
This facilitates implementation of relatively simple file system level
caches without the need to synchronize with the file system.
Enable read-only sections for 'git status' and preload_index.
Signed-off-by: Karsten Blees <blees@dcon.de>
This introduces `git survey` to Git for Windows ahead of upstream for
the express purpose of getting the path-based analysis in the hands of
more folks.
The inspiration of this builtin is
[`git-sizer`](https://github.com/github/git-sizer), but since that
command relies on `git cat-file --batch` to get the contents of objects,
it has limits to how much information it can provide.
This is mostly a rewrite of the `git survey` builtin that was introduced
into the `microsoft/git` fork in microsoft/git#667. That version had a
lot more bells and whistles, including an analysis much closer to what
`git-sizer` provides.
The biggest difference in this version is that this one is focused on
using the path-walk API in order to visit batches of objects based on a
common path. This allows identifying, for instance, the path that is
contributing the most to the on-disk size across all versions at that
path.
For example, here are the top ten paths contributing to my local Git
repository (which includes `microsoft/git` and `gitster/git`):
```
TOP FILES BY DISK SIZE
============================================================================
Path | Count | Disk Size | Inflated Size
-----------------------------------------+-------+-----------+--------------
whats-cooking.txt | 1373 | 11637459 | 37226854
t/helper/test-gvfs-protocol | 2 | 6847105 | 17233072
git-rebase--helper | 1 | 6027849 | 15269664
compat/mingw.c | 6111 | 5194453 | 463466970
t/helper/test-parse-options | 1 | 3420385 | 8807968
t/helper/test-pkt-line | 1 | 3408661 | 8778960
t/helper/test-dump-untracked-cache | 1 | 3408645 | 8780816
t/helper/test-dump-fsmonitor | 1 | 3406639 | 8776656
po/vi.po | 104 | 1376337 | 51441603
po/de.po | 210 | 1360112 | 71198603
```
This kind of analysis has been helpful in identifying the reasons for
growth in a few internal monorepos. Those findings motivated the changes
in #5157 and #5171.
With this early version in Git for Windows, we can expand the reach of
the experimental tool in advance of it being contributed to the upstream
project.
Unfortunately, this will mean that in the next `microsoft/git` rebase,
Jeff Hostetler's version will need to be pulled out since there are
enough conflicts. These conflicts include how tables are stored and
generated, as the version in this PR is slightly more general to allow
for different kinds of data.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 245670c (credential-cache: check for windows specific errors, 2021-09-14)
we concluded that on Windows we would always encounter ENETDOWN where we
would expect ECONNREFUSED on POSIX systems, when connecting to unix sockets.
As reported in [1], we do encounter ECONNREFUSED on Windows if the
socket file doesn't exist, but the containing directory does and ENETDOWN if
neither exists. We should handle this case like we do on non-windows systems.
[1] https://github.com/git-for-windows/git/pull/4762#issuecomment-2545498245
This fixes https://github.com/git-for-windows/git/issues/5314
Helped-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While this command is definitely something we _want_, chances are that
upstreaming this will require substantial changes.
We still want to be able to experiment with this before that, to focus
on what we need out of this command: To assist with diagnosing issues
with large repositories, as well as to help monitoring the growth and
the associated painpoints of such repositories.
To that end, we are about to integrate this command into
`microsoft/git`, to get the tool into the hands of users who need it
most, with the idea to iterate in close collaboration between these
users and the developers familar with Git's internals.
However, we will definitely want to avoid letting anybody have the
impression that this command, its exact inner workings, as well as its
output format, are anywhere close to stable. To make that fact utterly
clear (and thereby protect the freedom to iterate and innovate freely
before upstreaming the command), let's mark its output as experimental
in all-caps, as the first thing we do.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The 'git survey' builtin provides several detail tables, such as "top
files by on-disk size". The size of these tables defaults to 10,
currently.
Allow the user to specify this number via a new --top=<N> option or the
new survey.top config key.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Since we are already walking our reachable objects using the path-walk API,
let's now collect lists of the paths that contribute most to different
metrics. Specifically, we care about
* Number of versions.
* Total size on disk.
* Total inflated size (no delta or zlib compression).
This information can be critical to discovering which parts of the
repository are causing the most growth, especially on-disk size. Different
packing strategies might help compress data more efficiently, but the toal
inflated size is a representation of the raw size of all snapshots of those
paths. Even when stored efficiently on disk, that size represents how much
information must be processed to complete a command such as 'git blame'.
The exact disk size seems to be not quite robust enough for testing, as
could be seen by the `linux-musl-meson` job consistently failing, possibly
because of zlib-ng deflates differently: t8100.4(git survey
(default)) was failing with a symptom like this:
TOTAL OBJECT SIZES BY TYPE
===============================================
Object Type | Count | Disk Size | Inflated Size
------------+-------+-----------+--------------
- Commits | 10 | 1523 | 2153
+ Commits | 10 | 1528 | 2153
Trees | 10 | 495 | 1706
Blobs | 10 | 191 | 101
- Tags | 4 | 510 | 528
+ Tags | 4 | 547 | 528
This means: the disk size is unlikely something we can verify robustly.
Since zlib-ng seems to increase the disk size of the tags from 528 to
547, we cannot even assume that the disk size is always smaller than the
inflated size. We will most likely want to either skip verifying the
disk size altogether, or go for some kind of fuzzy matching, say, by
replacing `s/ 1[45][0-9][0-9] / ~1.5k /` and `s/ [45][0-9][0-9] / ~½k /`
or something like that.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In future changes, we will make use of these methods. The intention is to
keep track of the top contributors according to some metric. We don't want
to store all of the entries and do a sort at the end, so track a
constant-size table and remove rows that get pushed out depending on the
chosen sorting algorithm.
Co-authored-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by; Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Now that we have explored objects by count, we can expand that a bit more to
summarize the data for the on-disk and inflated size of those objects. This
information is helpful for diagnosing both why disk space (and perhaps
clone or fetch times) is growing but also why certain operations are slow
because the inflated size of the abstract objects that must be processed is
so large.
Note: zlib-ng is slightly more efficient even at those small sizes. Even
between zlib versions, there are slight differences in compression. To
accommodate for that in the tests, not the exact numbers but some rough
approximations are validated (the test should validate `git survey`,
after all, not zlib).
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
At the moment, nothing is obvious about the reason for the use of the
path-walk API, but this will become more prevelant in future iterations. For
now, use the path-walk API to sum up the counts of each kind of object.
For example, this is the reachable object summary output for my local repo:
REACHABLE OBJECT SUMMARY
========================
Object Type | Count
------------+-------
Tags | 1343
Commits | 179344
Trees | 314350
Blobs | 184030
Signed-off-by: Derrick Stolee <stolee@gmail.com>
When 'git survey' provides information to the user, this will be presented
in one of two formats: plaintext and JSON. The JSON implementation will be
delayed until the functionality is complete for the plaintext format.
The most important parts of the plaintext format are headers specifying the
different sections of the report and tables providing concreted data.
Create a custom table data structure that allows specifying a list of
strings for the row values. When printing the table, check each column for
the maximum width so we can create a table of the correct size from the
start.
The table structure is designed to be flexible to the different kinds of
output that will be implemented in future changes.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
By default we will scan all references in "refs/heads/", "refs/tags/"
and "refs/remotes/".
Add command line opts let the use ask for all refs or a subset of them
and to include a detached HEAD.
Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Start work on a new 'git survey' command to scan the repository
for monorepo performance and scaling problems. The goal is to
measure the various known "dimensions of scale" and serve as a
foundation for adding additional measurements as we learn more
about Git monorepo scaling problems.
The initial goal is to complement the scanning and analysis performed
by the GO-based 'git-sizer' (https://github.com/github/git-sizer) tool.
It is hoped that by creating a builtin command, we may be able to take
advantage of internal Git data structures and code that is not
accessible from GO to gain further insight into potential scaling
problems.
Co-authored-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.
In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".
Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).
This addresses https://github.com/git-for-windows/git/issues/607
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Remove implicit reliance on the_repository global in the APIs
around tree objects and make it explicit which repository to work
in.
* rs/tree-wo-the-repository:
cocci: remove obsolete the_repository rules
cocci: convert parse_tree functions to repo_ variants
tree: stop using the_repository
tree: use repo_parse_tree()
path-walk: use repo_parse_tree_gently()
pack-bitmap-write: use repo_parse_tree()
delta-islands: use repo_parse_tree()
bloom: use repo_parse_tree()
add-interactive: use repo_parse_tree_indirect()
tree: add repo_parse_tree*()
environment: move access to core.maxTreeDepth into repo settings
"git repack --geometric" did not work with promisor packs, which
has been corrected.
* ps/geometric-repacking-with-promisor-remotes:
builtin/repack: handle promisor packs with geometric repacking
repack-promisor: extract function to remove redundant packs
repack-promisor: extract function to finalize repacking
repack-geometry: extract function to compute repacking split
builtin/pack-objects: exclude promisor objects with "--stdin-packs"
The object-info API has been cleaned up.
* ps/read-object-info-improvements:
packfile: drop repository parameter from `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: disentangle return value of `packed_object_info()`
packfile: always populate pack-specific info when reading object info
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always declare object info to be OI_PACKED
object-file: always set OI_LOOSE when reading object info
The packfile_store data structure is moved from object store to odb
source.
* ps/packfile-store-in-odb-source:
packfile: move MIDX into packfile store
packfile: refactor `find_pack_entry()` to work on the packfile store
packfile: inline `find_kept_pack_entry()`
packfile: only prepare owning store in `packfile_store_prepare()`
packfile: only prepare owning store in `packfile_store_get_packs()`
packfile: move packfile store into object source
packfile: refactor misleading code when unusing pack windows
packfile: refactor kept-pack cache to work with packfile stores
packfile: pass source to `prepare_pack()`
packfile: create store via its owning source
Update code paths that check data integrity around refs subsystem.
cf. <CAOLa=ZShPP3BPXa=YnC-vuX4zF=pUTFdUidZwOdna8bfVTNM9w@mail.gmail.com>
* ps/ref-consistency-checks:
builtin/fsck: drop `fsck_head_link()`
builtin/fsck: move generic HEAD check into `refs_fsck()`
builtin/fsck: move generic object ID checks into `refs_fsck()`
refs/reftable: introduce generic checks for refs
refs/reftable: fix consistency checks with worktrees
refs/reftable: extract function to retrieve backend for worktree
refs/reftable: adapt includes to become consistent
refs/files: introduce function to perform normal ref checks
refs/files: extract generic symref target checks
fsck: drop unused fields from `struct fsck_ref_report`
refs/files: perform consistency checks for root refs
refs/files: improve error handling when verifying symrefs
refs/files: extract function to check single ref
refs/files: remove useless indirection
refs/files: remove `refs_check_dir` parameter
refs/files: move fsck functions into global scope
refs/files: simplify iterating through root refs
"git fsck" used inconsistent set of refs to show a confused
warning, which has been corrected.
* en/fsck-snapshot-ref-state:
fsck: snapshot default refs before object walk
"git patch-id" documentation updates.
* kh/doc-patch-id:
doc: patch-id: --verbatim locks in --stable
doc: patch-id: spell out the git-diff-tree(1) form
doc: patch-id: use definite article for the result
patch-id: use “patch ID” throughout
doc: patch-id: capitalize Git version
doc: patch-id: don’t use semicolon between bullet points
This reverts commit f406b8955295d01089ba2baf35eceadff2d11cae,
reversing changes made to 1627809eeff75e6ec936fc609e7be46d5eb2fa9e.
It seems to have caused a few regressions, two of the three known
ones we have proposed solutions for. Let's give ourselves a bit
more room to maneuver during the pre-release freeze period and
restart once the 2.53 ships.
Improve the error message when a bad argument is given to the
`--onto` option of "git replay". Test coverage of "git replay" has
been improved.
* kh/replay-invalid-onto-advance:
t3650: add more regression tests for failure conditions
replay: die if we cannot parse object
replay: improve code comment and die message
replay: die descriptively when invalid commit-ish is given
replay: find *onto only after testing for ref name
replay: remove dead code and rearrange
Miscellaneous fixes on object database layer.
* ps/odb-misc-fixes:
odb: properly close sources before freeing them
builtin/gc: fix condition for whether to write commit graphs
When performing a fetch with an object filter, we mark the resulting
packfile as a promisor pack. An object part of such a pack may miss any
of its referenced objects, and Git knows to handle this case by fetching
any such missing objects from the promisor remote.
The "promisor" property needs to be retained going forward. So every
time we pack a promisor object, the resulting pack must be marked as a
promisor pack. git-repack(1) does this already: when a repository has a
promisor remote, it knows to pass "--exclude-promisor-objects" to the
git-pack-objects(1) child process. Promisor packs are written separately
when doing an all-into-one repack via `repack_promisor_objects()`.
But we don't support promisor objects when doing a geometric repack yet.
Promisor packs do not get any special treatment there, as we simply
merge promisor and non-promisor packs. The resulting pack is not even
marked as a promisor pack, which essentially corrupts the repository.
This corruption couldn't happen in the real world though: we pass both
"--exclude-promisor-objects" and "--stdin-packs" to git-pack-objects(1)
if a repository has a promisor remote, but as those options are mutually
exclusive we always end up dying. And while we made those flags
compatible with one another in a preceding commit, we still end up dying
in case git-pack-objects(1) is asked to repack a promisor pack.
There's multiple ways to fix this:
- We can exclude promisor packs from the geometric progression
altogether. This would have the consequence that we never repack
promisor packs at all. But in a partial clone it is quite likely
that the user generates a bunch of promisor packs over time, as
every backfill fetch would create another one. So this doesn't
really feel like a sensible option.
- We can adapt git-pack-objects(1) to support repacking promisor packs
and include them in the normal geometric progression. But this would
mean that the set of promisor objects expands over time as the packs
are merged with normal packs.
- We can use a separate geometric progression to repack promisor
packs.
The first two options both have significant downsides, so they aren't
really feasible. But the third option fixes both of these downsides: we
make sure that promisor packs get merged, and at the same time we never
expand the set of promisor objects beyond the set of objects that are
already marked as promisor objects.
Implement this strategy so that geometric repacking works in partial
clones.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is currently not possible to combine "--exclude-promisor-objects"
with "--stdin-packs" because both flags want to set up a revision walk
to enumerate the objects to pack. In a subsequent commit though we want
to extend geometric repacks to support promisor objects, and for that we
need to handle the combination of both flags.
There are two cases we have to think about here:
- "--stdin-packs" asks us to pack exactly the objects part of the
specified packfiles. It is somewhat questionable what to do in the
case where the user asks us to exclude promisor objects, but at the
same time explicitly passes a promisor pack to us. For now, we
simply abort the request as it is self-contradicting. As we have
also been dying before this commit there is no regression here.
- "--stdin-packs=follow" does the same as the first flag, but it also
asks us to include all objects transitively reachable from any
object in the packs we are about to repack. This is done by doing
the revision walk mentioned further up. Luckily, fixing this case is
trivial: we only need to modify the revision walk to also set the
`exclude_promisor_objects` field.
Note that we do not support the "--exclude-promisor-objects-best-effort"
flag for now as we don't need it to support geometric repacking with
promisor objects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `fsck_head_link()` was historically used to perform a
couple of consistency checks for refs. (Almost) all of these checks have
now been moved into the refs subsystem. There's only a single check
remaining that verifies whether `refs_resolve_ref_unsafe()` returns a
`NULL` pointer. This may happen in a couple of cases:
- When `refs_is_safe()` declares the ref to be unsafe. We already have
checks for this as we verify refnames with `check_refname_format()`.
- When the ref doesn't exist. A repository without "HEAD" is
completely broken though, and we would notice this error ahead of
time already.
- In case the caller passes `RESOLVE_REF_READING` and the ref is a
symref that doesn't resolve. We don't pass this flag though.
As such, this check doesn't cover anything anymore that isn't already
covered by `refs_fsck()`. Drop it, which also allows us to inline the
call to `refs_resolve_ref_unsafe()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the check that detects "HEAD" refs that do not point at a branch
into `refs_fsck()`. This follows the same motivation as the preceding
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While most of the logic that verifies the consistency of refs is
driven by `refs_fsck()`, we still have a small handful of checks in
`fsck_head_link()`. These checks don't use the git-fsck(1) reporting
infrastructure, and as such it's impossible to for example disable
some of those checks.
One such check detects refs that point to the all-zeroes object ID.
Extract this check into the generic `refs_fsck_ref()` function that is
used by both the "files" and "reftable" backends.
Note that this will cause us to not return an error code from
`fsck_head_link()` anymore in case this error was detected. This is fine
though: the only caller of this function does not check the error code
anyway. To demonstrate this, adapt the function to drop its return value
altogether. The function will be removed in a subsequent commit anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.
* rs/commit-stack:
commit-reach: use commit_stack
commit-graph: use commit_stack
commit: add commit_stack_grow()
shallow: use commit_stack
pack-bitmap-write: use commit_stack
commit: add commit_stack_init()
test-reach: use commit_stack
remote: use commit_stack for src_commits
remote: use commit_stack for sent_tips
remote: use commit_stack for local_commits
name-rev: use commit_stack
midx: use commit_stack
log: use commit_stack
revision: export commit_stack
Add and apply a semantic patch to convert calls to parse_tree() and
friends to the corresponding variant that takes a repository argument,
to allow the functions that implicitly use the_repository to be retired
once all potential in-flight topics are settled and converted as well.
The changes in .c files were generated by Coccinelle, but I fixed a
whitespace bug it would have introduced to builtin/commit.c.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fsck has a race when operating on live repositories; consider the
following simple script that writes new commits as fsck runs:
#!/bin/bash
git fsck &
PID=$!
while ps -p $PID >/dev/null; do
sleep 3
git commit -q --allow-empty -m "Another commit"
done
Since fsck walks objects for connectivity and then reads the refs at the
end to check, this can cause fsck to get confused and think that the new
refs refer to missing commits and that new reflog entries are invalid.
Running the above script in a clone of git.git results in the following
(output ellipsized to remove additional errors of the same type):
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
[...]
error: HEAD: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: HEAD: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
error: refs/heads/mybranch invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: refs/heads/mybranch: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
[...]
error: refs/heads/mybranch: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: refs/heads/mybranch: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
Checking connectivity: 833846, done.
missing commit 6724f2dfede88bfa9445a333e06e78536c0c6c0d
Verifying commits in commit graph: 100% (242243/242243), done.
We can minimize the race opportunities by taking a snapshot of refs at
program invocation, doing the connectivity check, and then checking the
snapshotted refs afterward. This avoids races with regular refs between
fsck and adding objects to the database, though it still leaves a race
between a gc and fsck. We are less concerned about folks simultaneously
running gc with fsck; though, if it becomes an issue, we could lock fsck
during gc. We definitely do not want to lock fsck during operations
that may add objects to the object store; that would be problematic for
forges.
Note that refs aren't the only problem, though; reflog entries and index
entries could be problematic as well. For now we punt on index entries
just leaving a TODO comment, and for reflogs we use a coarse solution of
taking the time at the beginning of the program and ignoring reflog
entries newer than that time. That may be imperfect if dealing with a
network filesystem, so we leave TODO comment for those that want to
improve that handling as well.
As a high level overview:
* In addition to fsck_handle_ref(), which now is only a few lines long
to process a ref, there's also a snapshot_ref() which is called
early in the program for each ref and takes all the error checking
logic.
* The iterating over refs that used to be in get_default_heads() plus
a loop over the arguments now appears in shapshot_refs().
* There's a new process_refs() as well that kind of looks like the old
get_default_heads() though it is streamlined due to the work done by
snapshot_refs().
This combination of changes modifies the output of running the script
(from the beginning of this commit message) to:
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
Checking connectivity: 833846, done.
Verifying commits in commit graph: 100% (242243/242243), done.
While worries about live updates while running fsck is likely of most
interest for forge operators, it may also benefit those with
automated jobs (such as git maintenance) or even casual users who want
to do other work in their clone while fsck is running.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `packfile_store_prepare()` we prepare not only the provided
packfile store, but also all those of all other sources part of the same
object database. This was required when the store was still sitting on
the object database level. But now that it sits on the source level it's
not anymore.
Refactor the code so that we only prepare the single packfile store
passed by the caller. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.
Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.
Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.
Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.
Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.
Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The “Description” section decided to introduce and use the term “patch
ID” for the ID value itself. Let’s use the same term on the options as
well.
Also make to sure to use bare “ID” instead of “id”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* rs/tag-wo-the-repository:
tag: stop using the_repository
tag: support arbitrary repositories in parse_tag()
tag: support arbitrary repositories in gpg_verify_tag()
tag: use algo of repo parameter in parse_tag_buffer()