This topic branch allows `add -p` and `add -i` with a large number of
files. It is kind of a hack that was never really meant to be
upstreamed. Let's see if we can do better in the built-in `add -p`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch teaches `git clean` to respect NTFS junctions and Unix
bind mounts: it will now stop at those boundaries.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch allows us to specify absolute paths without the drive
prefix e.g. when cloning.
Example:
C:\Users\me> git clone https://github.com/git/git \upstream-git
This will clone into a new directory C:\upstream-git, in line with how
Windows interprets absolute paths.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
To verify that the `clean` side of the `clean`/`smudge` filter code is
correct with regards to LLP64 (read: to ensure that `size_t` is used
instead of `unsigned long`), here is a test case using a trivial filter,
specifically _not_ writing anything to the object store to limit the
scope of the test case.
As in previous commits, the `big` file from previous test cases is
reused if available, to save setup time, otherwise re-generated.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
To complement the `--stdin` and `--literally` test cases that verify
that we can hash files larger than 4GB on 64-bit platforms using the
LLP64 data model, here is a test case that exercises `hash-object`
_without_ any options.
Just as before, we use the `big` file from the previous test case if it
exists to save on setup time, otherwise generate it.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Just like the `hash-object --literally` code path, the `--stdin` code
path also needs to use `size_t` instead of `unsigned long` to represent
memory sizes, otherwise it would cause problems on platforms using the
LLP64 data model (such as Windows).
To limit the scope of the test case, the object is explicitly not
written to the object store, nor are any filters applied.
The `big` file from the previous test case is reused to save setup time;
To avoid relying on that side effect, it is generated if it does not
exist (e.g. when running via `sh t1007-*.sh --long --run=1,41`).
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continue walking the code path for the >4GB `hash-object --literally`
test to the hash algorithm step for LLP64 systems.
This patch lets the SHA1DC code use `size_t`, making it compatible with
LLP64 data models (as used e.g. by Windows).
The interested reader of this patch will note that we adjust the
signature of the `git_SHA1DCUpdate()` function without updating _any_
call site. This certainly puzzled at least one reviewer already, so here
is an explanation:
This function is never called directly, but always via the macro
`platform_SHA1_Update`, which is usually called via the macro
`git_SHA1_Update`. However, we never call `git_SHA1_Update()` directly
in `struct git_hash_algo`. Instead, we call `git_hash_sha1_update()`,
which is defined thusly:
static void git_hash_sha1_update(git_hash_ctx *ctx,
const void *data, size_t len)
{
git_SHA1_Update(&ctx->sha1, data, len);
}
i.e. it contains an implicit downcast from `size_t` to `unsigned long`
(before this here patch). With this patch, there is no downcast anymore.
With this patch, finally, the t1007-hash-object.sh "files over 4GB hash
literally" test case is fixed.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On LLP64 systems, such as Windows, the size of `long`, `int`, etc. is
only 32 bits (for backward compatibility). Git's use of `unsigned long`
for file memory sizes in many places, rather than size_t, limits the
handling of large files on LLP64 systems (commonly given as `>4GB`).
Provide a minimum test for handling a >4GB file. The `hash-object`
command, with the `--literally` and without `-w` option avoids
writing the object, either loose or packed. This avoids the code paths
hitting the `bigFileThreshold` config test code, the zlib code, and the
pack code.
Subsequent patches will walk the test's call chain, converting types to
`size_t` (which is larger in LLP64 data models) where appropriate.
Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For some reason, this test case was indented with 4 spaces instead of 1
horizontal tab. The other test cases in the same test script are fine.
Signed-off-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This change enhances `git commit --cleanup=scissors` by detecting
scissors lines ending in either LF (UNIX-style) or CR/LF (DOS-style).
Regression tests are included to specifically test for trailing
comments after a CR/LF-terminated scissors line.
Signed-off-by: Luke Bonanomi <lbonanomi@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The convention in Git project's shell scripts is to have white-space
_before_, but not _after_ the `>` (or `<`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
There is a Win32 API function to resolve symbolic links, and we can use
that instead of resolving them manually. Even better, this function also
resolves NTFS junction points (which are somewhat similar to bind
mounts).
This fixes https://github.com/git-for-windows/git/issues/2481.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When we commit the template directory as part of `make vcxproj`, the
`branches/` directory is not actually commited, as it is empty.
Two tests were not prepared for that situation.
This developer tried to get rid of the support for `.git/branches/` a
long time ago, but that effort did not bear fruit, so the best we can do
is work around in these here tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows wants to add `git.exe` to the users' `PATH`, without
cluttering the latter with unnecessary executables such as `wish.exe`.
To that end, it invented the concept of its "Git wrapper", i.e. a tiny
executable located in `C:\Program Files\Git\cmd\git.exe` (originally a
CMD script) whose sole purpose is to set up a couple of environment
variables and then spawn the _actual_ `git.exe` (which nowadays lives in
`C:\Program Files\Git\mingw64\bin\git.exe` for 64-bit, and the obvious
equivalent for 32-bit installations).
Currently, the following environment variables are set unless already
initialized:
- `MSYSTEM`, to make sure that the MSYS2 Bash and the MSYS2 Perl
interpreter behave as expected, and
- `PLINK_PROTOCOL`, to force PuTTY's `plink.exe` to use the SSH
protocol instead of Telnet,
- `PATH`, to make sure that the `bin` folder in the user's home
directory, as well as the `/mingw64/bin` and the `/usr/bin`
directories are included. The trick here is that the `/mingw64/bin/`
and `/usr/bin/` directories are relative to the top-level installation
directory of Git for Windows (which the included Bash interprets as
`/`, i.e. as the MSYS pseudo root directory).
Using the absence of `MSYSTEM` as a tell-tale, we can detect in
`git.exe` whether these environment variables have been initialized
properly. Therefore we can call `C:\Program Files\Git\mingw64\bin\git`
in-place after this change, without having to call Git through the Git
wrapper.
Obviously, above-mentioned directories must be _prepended_ to the `PATH`
variable, otherwise we risk picking up executables from unrelated Git
installations. We do that by constructing the new `PATH` value from
scratch, appending `$HOME/bin` (if `HOME` is set), then the MSYS2 system
directories, and then appending the original `PATH`.
Side note: this modification of the `PATH` variable is independent of
the modification necessary to reach the executables and scripts in
`/mingw64/libexec/git-core/`, i.e. the `GIT_EXEC_PATH`. That
modification is still performed by Git, elsewhere, long after making the
changes described above.
While we _still_ cannot simply hard-link `mingw64\bin\git.exe` to `cmd`
(because the former depends on a couple of `.dll` files that are only in
`mingw64\bin`, i.e. calling `...\cmd\git.exe` would fail to load due to
missing dependencies), at least we can now avoid that extra process of
running the Git wrapper (which then has to wait for the spawned
`git.exe` to finish) by calling `...\mingw64\bin\git.exe` directly, via
its absolute path.
Testing this is in Git's test suite tricky: we set up a "new" MSYS
pseudo-root and copy the `git.exe` file into the appropriate location,
then verify that `MSYSTEM` is set properly, and also that the `PATH` is
modified so that scripts can be found in `$HOME/bin`, `/mingw64/bin/`
and `/usr/bin/`.
This addresses https://github.com/git-for-windows/git/issues/2283
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
NTFS junctions are somewhat similar in spirit to Unix bind mounts: they
point to a different directory and are resolved by the filesystem
driver. As such, they appear to `lstat()` as if they are directories,
not as if they are symbolic links.
_Any_ user can create junctions, while symbolic links can only be
created by non-administrators in Developer Mode on Windows 10. Hence
NTFS junctions are much more common "in the wild" than NTFS symbolic
links.
It was reported in https://github.com/git-for-windows/git/issues/2481
that adding files via an absolute path that traverses an NTFS junction:
since 1e64d18 (mingw: do resolve symlinks in `getcwd()`), we resolve not
only symbolic links but also NTFS junctions when determining the
absolute path of the current directory. The same is not true for `git
add <file>`, where symbolic links are resolved in `<file>`, but not NTFS
junctions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Windows' equivalent to "bind mounts", NTFS junction points, can be
unlinked without affecting the mount target. This is clearly what users
expect to happen when they call `git clean -dfx` in a worktree that
contains NTFS junction points: the junction should be removed, and the
target directory of said junction should be left alone (unless it is
inside the worktree).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When specifying an absolute path without a drive prefix, we convert that
path internally. Let's make sure that we handle that case properly, too
;-)
This fixes the command
git clone https://github.com/git-for-windows/git \G4W
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It seems to be not exactly rare on Windows to install NTFS junction
points (the equivalent of "bind mounts" on Linux/Unix) in worktrees,
e.g. to map some development tools into a subdirectory.
In such a scenario, it is pretty horrible if `git clean -dfx` traverses
into the mapped directory and starts to "clean up".
Let's just not do that. Let's make sure before we traverse into a
directory that it is not a mount point (or junction).
This addresses https://github.com/git-for-windows/git/issues/607
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Windows, there are several categories of absolute paths. One such
category starts with a backslash and is implicitly relative to the
drive associated with the current working directory. Example:
c:
git clone https://github.com/git-for-windows/git \G4W
should clone into C:\G4W.
There is currently a problem with that, in that mingw_mktemp() does not
expect the _wmktemp() function to prefix the absolute path with the
drive prefix, and as a consequence, the resulting path does not fit into
the originally-passed string buffer. The symptom is a "Result too large"
error.
Reported by Juan Carlos Arevalo Baeza.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This happens only when the corresponding commits are not exported in
the current fast-export run. This can happen either when the relevant
commit is already marked, or when the commit is explicitly marked
as UNINTERESTING with a negative ref by another argument.
This breaks fast-export basec remote helpers.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
"git rev-list --unpacked --objects" failed to exclude packed
non-commit objects, which has been corrected.
* tb/rev-list-unpacked-fix:
pack-bitmap: drop --unpacked non-commit objects from results
list-objects: drop --unpacked non-commit objects from results
Leakfix.
* ps/leakfixes:
setup: fix leaking repository format
setup: refactor `upgrade_repository_format()` to have common exit
shallow: fix memory leak when registering shallow roots
test-bloom: stop setting up Git directory twice
Another step to deprecate test_i18ngrep.
* jc/test-i18ngrep:
tests: teach callers of test_i18ngrep to use test_grep
test framework: further deprecate test_i18ngrep
"git merge-file" learns a mode to read three contents to be merged
from blob objects.
* bc/merge-file-object-input:
merge-file: add an option to process object IDs
git-merge-file doc: drop "-file" from argument placeholders
"git rev-list --missing" did not work for missing commit objects,
which has been corrected.
* kn/rev-list-missing-fix:
rev-list: add commit object support in `--missing` option
rev-list: move `show_commit()` to the bottom
revision: rename bit to `do_not_die_on_missing_objects`
Teach "git show-ref" a mode to check the existence of a ref.
* ps/show-ref:
t: use git-show-ref(1) to check for ref existence
builtin/show-ref: add new mode to check for reference existence
builtin/show-ref: explicitly spell out different modes in synopsis
builtin/show-ref: ensure mutual exclusiveness of subcommands
builtin/show-ref: refactor options for patterns subcommand
builtin/show-ref: stop using global vars for `show_one()`
builtin/show-ref: stop using global variable to count matches
builtin/show-ref: refactor `--exclude-existing` options
builtin/show-ref: fix dead code when passing patterns
builtin/show-ref: fix leaking string buffer
builtin/show-ref: split up different subcommands
builtin/show-ref: convert pattern to a local variable
The codepath to traverse the commit-graph learned to notice that a
commit is missing (e.g., corrupt repository lost an object), even
though it knows something about the commit (like its parents) from
what is in commit-graph.
* ps/do-not-trust-commit-graph-blindly-for-existence:
commit: detect commits that exist in commit-graph but not in the ODB
commit-graph: introduce envvar to disable commit existence checks
When performing revision queries with `--objects` and
`--use-bitmap-index`, the output may incorrectly contain objects which
are packed, even when the `--unpacked` option is given. This affects
traversals, but also other querying operations, like `--count`,
`--disk-usage`, etc.
Like in the previous commit, the fix is to exclude those objects from
the result set before they are shown to the user (or, in this case,
before the bitmap containing the result of the traversal is enumerated
and its objects listed).
This is performed by a new function in pack-bitmap.c, called
`filter_packed_objects_from_bitmap()`. Note that we do not have to
inspect individual bits in the result bitmap, since we know that the
first N (where N is the number of objects in the bitmap's pack/MIDX)
bits correspond to objects which packed by definition.
In other words, for an object to have a bitmap position (not in the
extended index), it must appear in either the bitmap's pack or one of
the packs in its MIDX.
This presents an appealing optimization to us, which is that we can
simply memset() the corresponding number of `eword_t`'s to zero,
provided that we handle any objects which spill into the next word (but
don't occupy all 64 bits of the word itself).
We only have to handle objects in the bitmap's extended index. These
objects may (or may not) appear in one or more pack(s). Since these
objects are known to not appear in either the bitmap's MIDX or pack,
they may be stored as loose, appear in other pack(s), or both.
Before returning a bitmap containing the result of the traversal back to
the caller, drop any bits from the extended index which appear in one or
more packs. This implements the correct behavior for rev-list operations
which use the bitmap index to compute their result.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-rev-list(1), we describe the `--unpacked` option as:
Only useful with `--objects`; print the object IDs that are not in
packs.
This is true of commits, which we discard via get_commit_action(), but
not of the objects they reach. So if we ask for an --objects traversal
with --unpacked, we may get arbitrarily many objects which are indeed
packed.
I am nearly certain this behavior dates back to the introduction of
`--unpacked` via 12d2a18780 ("git rev-list --unpacked" shows only
unpacked commits, 2005-07-03), but I couldn't get that revision of Git
to compile for me. At least as early as v2.0.0 this has been subtly
broken:
$ git.compile --version
git version 2.0.0
$ git.compile rev-list --objects --all --unpacked
72791fe96c93f9ec5c311b8bc966ab349b3b5bbe
05713d991c18bbeef7e154f99660005311b5004d v1.0
153ed8b7719c6f5a68ce7ffc43133e95a6ac0fdb
8e4020bb5a8d8c873b25de15933e75cc0fc275df one
9200b628cf9dc883a85a7abc8d6e6730baee589c two
3e6b46e1b7e3b91acce99f6a823104c28aae0b58 unpacked.t
There, only the first, third, and sixth entries are loose, with the
remaining set of objects belonging to at least one pack.
The implications for this are relatively benign: bare 'git repack'
invocations which invoke pack-objects with --unpacked are impacted, and
at worst we'll store a few extra objects that should have been excluded.
Arguably changing this behavior is a backwards-incompatible change,
since it alters the set of objects emitted from rev-list queries with
`--objects` and `--unpacked`. But I argue that this change is still
sensible, since the existing implementation deviates from
clearly-written documentation.
The fix here is straightforward: avoid showing any non-commit objects
which are contained in packs by discarding them within list-objects.c,
before they are shown to the user. Note that similar treatment for
`list-objects.c::show_commit()` is not needed, since that case is
already handled by `revision.c::get_commit_action()`.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git bugreport" learned to complain when it received a command line
argument that it will not use.
* es/bugreport-no-extra-arg:
bugreport: reject positional arguments
t0091-bugreport: stop using i18ngrep
"git send-email" did not have certain pieces of data computed yet
when it tried to validate the outging messages and its recipient
addresses, which has been sorted out.
* ms/send-email-validate-fix:
send-email: move validation code below process_address_list
"git reflog expire --single-worktree" has been broken for the past
20 months or so, which has been corrected.
* rs/reflog-expire-single-worktree-fix:
reflog: fix expire --single-worktree
"cd sub && git grep -f patterns" tried to read "patterns" file at
the top level of the working tree; it has been corrected to read
"sub/patterns" instead.
* jc/grep-f-relative-to-cwd:
grep: -f <path> is relative to $cwd
When registering shallow roots, we unset the list of parents of the
to-be-registered commit if it's already been parsed. This causes us to
leak memory though because we never free this list. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're setting up the Git directory twice in the `test-tool bloom`
helper, once at the beginning of `cmd_bloom()` and once in the local
subcommand implementation `get_bloom_filter_for_commit()`. This can lead
to memory leaks as we'll overwrite variables of `the_repository` with
newly allocated data structures. On top of that it's simply unnecessary.
Fix this by only setting up the Git directory once.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perl script introduced by 86b008ee61 (t: add library for munging
chunk-format files, 2023-10-09) uses pack("Q") and unpack("Q") to read
and write 64-bit values ("quadwords" in perl parlance) from the on-disk
chunk files. However, some builds of perl may not support 64-bit
integers at all, and throw an exception here. While some 32-bit
platforms may still support 64-bit integers in perl (such as our linux32
CI environment), others reportedly don't (the NonStop 32-bit builds).
We can work around this by treating the 64-bit values as two 32-bit
values. We can't ever combine them into a single 64-bit value, but in
practice this is OK. These are representing file offsets, and our files
are much smaller than 4GB. So the upper half of the 64-bit value will
always be 0.
We can just introduce a few helper functions which perform the
translation and double-check our assumptions.
Reported-by: Randall S. Becker <randall.becker@nexbridge.ca>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They are equivalents and the former still exists, so as long as the
only change this commit makes are to rewrite test_i18ngrep to
test_grep, there won't be any new bug, even if there still are
callers of test_i18ngrep remaining in the tree, or when merged to
other topics that add new uses of test_i18ngrep.
This patch was produced more or less with
git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' |
xargs perl -p -i -e 's/test_i18ngrep /test_grep /'
and a good way to sanity check the result yourself is to run the
above in a checkout of c4603c1c (test framework: further deprecate
test_i18ngrep, 2023-10-31) and compare the resulting working tree
contents with the result of applying this patch to the same commit.
You'll see that test_i18ngrep in a few t/lib-*.sh files corrected,
in addition to the manual reproduction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>