This adds a trace point in start_command so we can see the full
command invocation without having to resort to strace/code inspection.
For example:
$ GIT_TRACE=1 git test foo
git.c:755 trace: exec: git-test foo
run-command.c:657 trace: run_command: git-test foo
run-command.c:657 trace: run_command: 'echo $*' foo
run-command.c:749 trace: start_command: /bin/sh -c 'echo $* "$@"' 'echo $*' foo
Prior changes have made the documentation around the internals of the
alias command execution clearer, but I have still found this detailed
view of the aliased command being run helpful for debugging purposes.
A test case is added to ensure the full command output is present in
the execution flow.
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing inline shell for shell-expansion aliases (i.e. prefixed
with "!"), there are some caveats around argument parsing to be aware
of. This series of notes attempts to explain what is happening more
clearly.
Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use add_patterns() to read .gitignore, .git/info/exclude, etc, as
well as other pattern-like files like sparse-checkout. The parser for
these uses an "int" as an index, meaning that files over 2GB will
generally cause signed integer overflow and out-of-bounds access.
This is unlikely to happen in any real files, but we do read .gitignore
files from the tree. A malicious tree could cause an out-of-bounds read
and segfault (we also write NULs over newlines, so in theory it could be
an out-of-bounds write, too, but as we go char-by-char, the first thing
that happens is trying to read a negative 2GB offset).
We could fix the most obvious issue by replacing one "int" with a
"size_t". But there are tons of "int" sprinkled throughout this code for
things like pattern lengths, number of patterns, and so on. Since nobody
would actually want a 2GB .gitignore file, an easy defensive measure is
to just refuse to parse them.
The "int" in question is in add_patterns_from_buffer(), so we could
catch it there. But by putting the checks in its two callers, we can
produce more useful error messages.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge down a handful of topics to adjust tests and CI to make them
work better, without changing Git itself, and a bit of developer
docs update:
* Tests that try to corrupt in-repository files in chunked format did
not work well on macOS due to its broken "mv", which has been
worked around.
* Unbreak CI jobs so that we do not attempt to use Python 2 that has
been removed from the platform.
* Git 2.43 started using the tree of HEAD as the source of attributes
in a bare repository, which has severe performance implications.
For now, revert the change, without ripping out a more explicit
support for the attr.tree configuration variable.
* Windows CI running in GitHub Actions started complaining about the
order of arguments given to calloc(); the imported regex code uses
the wrong order almost consistently, which has been corrected.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CI fix.
* jk/ci-macos-gcc13-fix:
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
The SubmittingPatches document now refers folks to manpages
translation project.
* jc/doc-manpages-l10n:
SubmittingPatches: advertise git-manpages-l10n project a bit
Windows CI running in GitHub Actions started complaining about the
order of arguments given to calloc(); the imported regex code uses
the wrong order almost consistently, which has been corrected.
* jc/compat-regex-calloc-fix:
compat/regex: fix argument order to calloc(3)
Git 2.43 started using the tree of HEAD as the source of attributes
in a bare repository, which has severe performance implications.
For now, revert the change, without ripping out a more explicit
support for the attr.tree configuration variable.
* jc/no-default-attr-tree-in-bare:
stop using HEAD for attributes in bare repository by default
Unbreak CI jobs so that we do not attempt to use Python 2 that has
been removed from the platform.
* ps/ci-python-2-deprecation:
ci: fix Python dependency on Ubuntu 24.04
Tests that try to corrupt in-repository files in chunked format did
not work well on macOS due to its broken "mv", which has been
worked around.
* jc/test-workaround-broken-mv:
t/lib-chunk: work around broken "mv" on some vintage of macOS
Under macOS, `git ls-files path` does not work (gives an error)
if the absolute 'path' contains characters in NFD (decomposed).
This happens when core.precomposeunicode is true, which is the
most common case. The bug report says:
$ cd somewhere # some safe place, /tmp or ~/tmp etc.
$ mkdir $'u\xcc\x88' # ü in NFD
$ cd ü # or cd $'u\xcc\x88' or cd $'\xc3\xbc'
$ git init
$ git ls-files $'/somewhere/u\xcc\x88' # NFD
fatal: /somewhere/ü: '/somewhere/ü' is outside repository at '/somewhere/ü'
$ git ls-files $'/somewhere/\xc3\xbc' # NFC
(the same error as above)
In the 'fatal:' error message, there are three ü;
the 1st and 2nd are in NFC, the 3rd is in NFD.
Add test cases that follows the bug report, with the simplification
that the 'ü' is replaced by an 'ä', which is already used as NFD and
NFC in t3910.
The solution is to add a call to precompose_string_if_needed()
to this code in setup.c :
`work_tree = precompose_string_if_needed(get_git_work_tree());`
There is, however, a limitation with this very usage of Git:
The (repo) local .gitconfig file is not used, only the global
"core.precomposeunicode" is taken into account, if it is set (or not).
To set it to true is a good recommendation anyway, and here is the
analyzes from Jun T :
The problem is the_repository->config->hash_initialized
is set to 1 before the_repository->commondir is set to ".git".
Due to this, .git/config is never read, and precomposed_unicode
is never set to 1 (remains -1).
run_builtin() {
setup_git_directory() {
strbuf_getcwd() { # setup.c:1542
precompose_{strbuf,string}_if_needed() {
# precomposed_unicode is still -1
git_congig_get_bool("core.precomposeunicode") {
git_config_check_init() {
repo_read_config() {
git_config_init() {
# !!!
the_repository->config->hash_initialized=1
# !!!
}
# does not read .git/config since
# the_repository->commondir is still NULL
}
}
}
returns without converting to NFC
}
returns cwd in NFD
}
setup_discovered_git_dir() {
set_git_work_tree(".") {
repo_set_worktree() {
# this function indirectly calls strbuf_getcwd()
# --> precompose_{strbuf,string}_if_needed() -->
# {git,repo}_config_get_bool("core.precomposeunicode"),
# but does not try to read .git/config since
# the_repository->config->hash_initialized
# is already set to 1 above. And it will not read
# .git/config even if hash_initialized is 0
# since the_repository->commondir is still NULL.
the_repository->worktree = NFD
}
}
}
setup_git_env() {
repo_setup_gitdir() {
repo_set_commondir() {
# finally commondir is set here
the_repository->commondir = ".git"
}
}
}
} // END setup_git_directory
Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/leakfixes:
mv: replace src_dir with a strvec
mv: factor out empty src_dir removal
mv: move src_dir cleanup to end of cmd_mv()
t-strvec: mark variable-arg helper with LAST_ARG_MUST_BE_NULL
t-strvec: use va_end() to match va_start()
While most of the commands in Git suite are designed to do useful
things in Git repositories, some commands are also usable outside
any repository. Building on top of an earlier work abece6e9 (t1517:
test commands that are designed to be run outside repository,
2024-05-20) that adds tests for such commands, let's give coverage
to some more commands.
This patch covers commands whose code has hits for
$ git grep setup_git_directory_gently
and passes a pointer to nongit_ok variable it uses to allow it to
run outside a Git repository, but mostly they are tested only to see
that they start up (as opposed to dying with "not in a git
repository" complaint). We may want to update them to actually do
something useful later, but this would at least help us catch
regressions by mistake.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fix-2.45.1-and-friends-for-maint:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
* fixes/2.45.1/2.44:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
* fixes/2.45.1/2.43:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
* fixes/2.45.1/2.42:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
* fixes/2.45.1/2.41:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
* fixes/2.45.1/2.40:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
* jc/fix-2.45.1-and-friends-for-2.39:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
Adjust jc/fix-2.45.1-and-friends-for-2.39 for more recent
maintenance track.
* jc/fix-2.45.1-and-friends-for-maint:
Revert "fsck: warn about symlink pointing inside a gitdir"
Revert "Add a helper function to compare file contents"
clone: drop the protections where hooks aren't run
tests: verify that `clone -c core.hooksPath=/dev/null` works again
Revert "core.hooksPath: add some protection while cloning"
init: use the correct path of the templates directory again
hook: plug a new memory leak
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
"git add -p" learned to complain when an answer with more than one
letter is given to a prompt that expects a single letter answer.
* jc/add-patch-enforce-single-letter-input:
add-patch: enforce only one-letter response to prompts
The strcmp-offset tests have been rewritten using the unit test
framework.
* gt/unit-test-strcmp-offset:
t/: port helper/test-strcmp-offset.c to unit-tests/t-strcmp-offset.c
The chainlint script (invoked during "make test") did nothing when
it failed to detect the number of available CPUs. It now falls
back to 1 CPU to avoid the problem.
* es/chainlint-ncores-fix:
chainlint.pl: latch CPU count directly reported by /proc/cpuinfo
chainlint.pl: fix incorrect CPU count on Linux SPARC
chainlint.pl: make CPU count computation more robust
The base topic started to make it an error for a command to leave
the hash algorithm unspecified, which revealed a few commands that
were not ready for the change. Give users a knob to revert back to
the "default is sha-1" behaviour as an escape hatch, and start
fixing these breakages.
* jc/undecided-is-not-necessarily-sha1-fix:
apply: fix uninitialized hash function
builtin/hash-object: fix uninitialized hash function
builtin/patch-id: fix uninitialized hash function
t1517: test commands that are designed to be run outside repository
setup: add an escape hatch for "no more default hash algorithm" change
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.
* ps/refs-without-the-repository-updates:
refs/packed: remove references to `the_hash_algo`
refs/files: remove references to `the_hash_algo`
refs/files: use correct repository
refs: remove `dwim_log()`
refs: drop `git_default_branch_name()`
refs: pass repo when peeling objects
refs: move object peeling into "object.c"
refs: pass ref store when detecting dangling symrefs
refs: convert iteration over replace refs to accept ref store
refs: retrieve worktree ref stores via associated repository
refs: refactor `resolve_gitlink_ref()` to accept a repository
refs: pass repo when retrieving submodule ref store
refs: track ref stores via strmap
refs: implement releasing ref storages
refs: rename `init_db` callback to avoid confusion
refs: adjust names for `init` and `init_db` callbacks
Code clean-up to make the reftable iterator closer to be reusable.
* ps/reftable-reusable-iterator:
reftable/merged: adapt interface to allow reuse of iterators
reftable/stack: provide convenience functions to create iterators
reftable/reader: adapt interface to allow reuse of iterators
reftable/generic: adapt interface to allow reuse of iterators
reftable/generic: move seeking of records into the iterator
reftable/merged: simplify indices for subiterators
reftable/merged: split up initialization and seeking of records
reftable/reader: set up the reader when initializing table iterator
reftable/reader: inline `reader_seek_internal()`
reftable/reader: separate concerns of table iter and reftable reader
reftable/reader: unify indexed and linear seeking
reftable/reader: avoid copying index iterator
reftable/block: use `size_t` to track restart point index
The knobs to tweak how reftable files are written have been made
available as configuration variables.
* ps/reftable-write-options:
refs/reftable: allow configuring geometric factor
reftable: make the compaction factor configurable
refs/reftable: allow disabling writing the object index
refs/reftable: allow configuring restart interval
reftable: use `uint16_t` to track restart interval
refs/reftable: allow configuring block size
reftable/dump: support dumping a table's block structure
reftable/writer: improve error when passed an invalid block size
reftable/writer: drop static variable used to initialize strbuf
reftable: pass opts as constant pointer
reftable: consistently refer to `reftable_write_options` as `opts`
Before discovering the repository details, We used to assume SHA-1
as the "default" hash function, which has been corrected. Hopefully
this will smoke out codepaths that rely on such an unwarranted
assumptions.
* ps/undecided-is-not-necessarily-sha1:
repository: stop setting SHA1 as the default object hash
oss-fuzz/commit-graph: set up hash algorithm
builtin/shortlog: don't set up revisions without repo
builtin/diff: explicitly set hash algo when there is no repo
builtin/bundle: abort "verify" early when there is no repository
builtin/blame: don't access potentially unitialized `the_hash_algo`
builtin/rev-parse: allow shortening to more than 40 hex characters
remote-curl: fix parsing of detached SHA256 heads
attr: fix BUG() when parsing attrs outside of repo
attr: don't recompute default attribute source
parse-options-cb: only abbreviate hashes when hash algo is known
path: move `validate_headref()` to its only user
path: harden validation of HEAD with non-standard hashes
Now that we have the `get_midx_filename_ext()` helper, we can
reimplement the `midx_bitmap_filename()` function in terms of it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f894081dea (pack-revindex: read multi-pack reverse indexes,
2021-03-30) introduced the `get_midx_rev_filename()` helper (later
modified by commit 60980aed78 (midx.c: write MIDX filenames to
strbuf, 2021-10-26)).
This function returns the location of the classic ".rev" files we used
to write for MIDXs (prior to 95e8383bac (midx.c: make changing the
preferred pack safe, 2022-01-25)), which is always of the form:
$GIT_DIR/objects/pack/multi-pack-index-$HASH.rev
Replace this function with a generic helper that populates a strbuf with
the above form, replacing the ".rev" extension with a caller-provided
argument.
This will allow us to remove a similarly-defined function in the
pack-bitmap code (used to determine the location of a MIDX .bitmap file)
by reimplementing it in terms of `get_midx_filename_ext()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid unconditionally copying all packs from an existing MIDX into a new
MIDX by checking that packs added via `fill_packs_from_midx()` don't
appear in the `to_include` set, if one was provided.
Do so by calling `should_include_pack()` from both `add_pack_to_midx()`
and `fill_packs_from_midx()`.
In order to make this work, teach `should_include_pack()` a new
"exclude_from_midx" parameter, which allows skipping the first check.
This is done so that the caller in `fill_packs_from_midx()` doesn't
reject all of the packs it provided since they appear in an existing
MIDX by definition.
The sum total of this change is that we are now able to read and
reference objects in an existing MIDX even when given a non-NULL
`packs_to_include`. This is a prerequisite step for incremental MIDXs,
which need to load any existing MIDX (if one is present) in order to
determine whether or not an object already appears in an earlier portion
of the MIDX to avoid duplicating it across multiple portions.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When write_midx_internal() loads an existing MIDX, all packs are copied
forward into the new MIDX. Improve the readability of
write_midx_internal() by extracting this functionality out into a
separate function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The add_pack_to_midx() callback used via for_each_file_in_pack_dir() is
used to add packs with .idx files to the MIDX being written.
Within this function, we have a pair of checks that discards packs
which:
- appear in an existing MIDX, if we successfully read an existing MIDX
from disk
- or, appear in the "to_include" list, if invoking the MIDX write
machinery with the `--stdin-packs` command-line argument.
A future commit will want to call a slight variant of these checks from
the code that reuses all packs from an existing MIDX, as well as the
current location via add_pack_to_midx(). The latter will be modified in
subsequent commits to only reuse packs which appear in the to_include
list, if one was given.
Prepare for that step by extracting these checks as a subroutine that
may be called from both places.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `compute_sorted_entries()` is broadly responsible for
building an array of the objects to be written into a MIDX based on the
provided list of packs.
If we have loaded an existing MIDX, however, we may not use all of its
packs, despite loading them into the ctx->info array.
The existing implementation simply skips past the first
ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
existing MIDX). This is because we read objects in packs from an
existing MIDX via the MIDX itself, rather than from the pack-level
fanout to guarantee a de-duplicated result (see: a40498a126 (midx: use
existing midx when writing new one, 2018-07-12)).
Future changes (outside the scope of this patch series) to the MIDX code
will require us to skip *at most* that number[^1].
We could tag each pack with a bit that indicates the pack's contents
should be included in the MIDX. But we can just as easily determine the
number of packs to skip by passing in the number of packs we learned
about after processing an existing MIDX.
[^1]: Kind of. The real number will be bounded by the number of packs in
a MIDX layer, and the number of packs in its base layer(s), but that
concept hasn't been fully defined yet.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `midx-write.c::get_sorted_entries()` is responsible for
constructing the array of OIDs from a given list of packs which will
comprise the MIDX being written.
The singular call-site for this function looks something like:
ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr,
&ctx.entries_nr,
ctx.preferred_pack_idx);
This function has five formal arguments, all of which are members of the
shared `struct write_midx_context` used to track various pieces of
information about the MIDX being written.
The function `get_sorted_entries()` dates back to fe1ed56f5e (midx:
sort and deduplicate objects from packfiles, 2018-07-12), which came
shortly after 396f257018 (multi-pack-index: read packfile list,
2018-07-12). The latter patch introduced the `pack_list` structure,
which was a precursor to the structure we now know as
`write_midx_context` (c.f. 577dc49696 (midx: rename pack_info to
write_midx_context, 2021-02-18)).
At the time, `get_sorted_entries()` likely could have used the pack_list
structure introduced earlier in 396f257018, but understandably did not
since the structure only contained three fields (only two of which were
relevant to `get_sorted_entries()`) at the time.
Simplify the declaration of this function by taking a single pointer to
the whole `struct write_midx_context` instead of various members within
it. Since this function is now computing the entire result (populating
both `ctx->entries`, and `ctx->entries_nr`), rename it to something that
doesn't start with "get_" to make clear that this function has a
side-effect.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>