An earlier attempt to optimize "git subtree" discarded too much
relevant histories, which has been corrected.
* cs/subtree-split-fixes:
contrib/subtree: process out-of-prefix subtrees
contrib/subtree: test history depth
contrib/subtree: capture additional test-cases
This change simplifies the code somewhat from its original
implementation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When run in a worktree, the GIT_DIR directory is set in a different way
than in a typical repository. Show this by updating t0068 to include a
worktree and add a test that runs from that worktree. This requires
moving the repo.key config into a global config instead of the base test
repository's local config (demonstrating that it worked with
non-worktree Git repositories).
We need to be careful to unset the local Git environment variables and
let the child process rediscover them, while also reinstating those
variables in the parent process afterwards. Update run_command_on_repo()
to use the new sanitize_repo_env() helper method to erase these
environment variables.
During review of this bug fix, there were several incorrect patches
demonstrating different bad behaviors. Most of these are covered by
tests, when it is not too expensive to set it up. One case that would be
expensive to set up is the GIT_NO_REPLACE_OBJECTS environment variable,
but we trust that using sanitize_repo_env() will be sufficient to
capture these uncovered cases by using the common code for resetting
environment variables.
Reported-by: Matthew Gabeler-Lee <fastcat@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current prepare_other_repo_env() does two distinct things:
1. Strip certain known environment variables that should be set by a
child process based on a different repository.
2. Set the GIT_DIR variable to avoid repository discovery.
The second item is valuable for child processes that operate on
submodules, where the repo discovery could be mistaken for the parent
repository.
In the next change, we will see an important case where only the first
item is required as the GIT_DIR discovery should happen naturally from
the '-C' parameter in the child process.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git for-each-repo' tool is frequently run outside of a repo context
in the real world. For example, it powers background maintenance.
Despite this typical case, we have not been testing it without a local
repository.
Update t0068 to stop creating a test repo and to use global config
everywhere. This has some subtle changes to test across the file.
This was noticed because an earlier attempt to remove the_repository
from builtin/for-each-repo.c did not catch a segmentation fault since
the passed 'repo' is NULL. This use of the_repository will need to stay
until we have a better way to handle config queries outside of a repo
context. Similar use still exists in builtin/config.c for the same
reason.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing list continuation marks ('+') after code blocks and shell examples
so paragraphs render correctly as part of the preceding list item.
Signed-off-by: Jonatan Holmgren <jonatan@jontes.page>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After a diff algorithm has been run, the compaction phase
(xdl_change_compact()) shifts and merges change groups to produce a
cleaner output. However, this shifting could create a new matched group
where both sides now have matching lines. This results in a
wrong-looking diff output which contains redundant lines that are the
same on both files.
Fix this by detecting this situation, and re-diff the texts on each side
to find similar lines, using the fall-back Myer's diff. Only do this for
histogram diff as it's the only algorithm where this is relevant. Below
contains an example, and more details.
For an example, consider two files below:
file1:
A
A
A
A
A
A
A
file2:
A
A
x
A
A
A
A
When using Myer's diff, the algorithm finds that only the "x" has been
changed, and produces a final diff result (these are line diffs, but
using word-diff syntax for ease of presentation):
A A[-A-]{+x+}A AAA
When using histogram diff, the algorithm first discovers the LCS "A
AAA", which it uses as anchor, then produces an intermediate diff:
{+A Ax+}A AAA[- AAA-].
This is a longer diff than Myer's, but it's still self-consistent.
However, the compaction phase attempts to shift the first file's diff
group upwards (note that this shift crosses the anchor that histogram
had used), leading to the final results for histogram diff:
[-A AA-]{+A Ax+}A AAA
This is a technically correct patch but looks clearly redundant to a
human as the first 3 lines should not be in the diff.
The fix would detect that a shift has caused matching to a new group,
and re-diff the "A AA" and "A Ax" parts, which results in "A A"
correctly re-marked as unchanged. This creates the now correct histogram
diff:
A A[-A-]{+x+}A AAA
This issue is not applicable to Myer's diff algorithm as it already
generates a minimal diff, which means a shift cannot result in a smaller
diff output (the default Myer's diff in xdiff is not guaranteed to be
minimal for performance reasons, but it typically does a good enough
job).
It's also not applicable to patience diff, because it uses only unique
lines as anchor for its splits, and falls back to Myer's diff within
each split. Shifting requires both ends having the same lines, and
therefore cannot cross the unique line boundaries established by the
patience algorithm. In contrast histogram diff uses non-unique lines as
anchors, and therefore shifting can cross over them.
This issue is rare in a normal repository. Below is a table of
repositories (`git log --no-merges -p --histogram -1000`), showing how
many times a re-diff was done and how many times it resulted in finding
matching lines (therefore addressing this issue) with the fix. In
general it is fewer than 1% of diff's that exhibit this offending
behavior:
| Repo (1k commits) | Re-diff | Found matching lines |
|--------------------|---------|----------------------|
| llvm-project | 45 | 11 |
| vim | 110 | 9 |
| git | 18 | 2 |
| WebKit | 168 | 1 |
| ripgrep | 22 | 1 |
| cpython | 32 | 0 |
| vscode | 13 | 0 |
Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Logically separate the introductory sentence from the first transport
description to improve readability and structural clarity.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jt/object-file-use-container-of:
object-file.c: avoid container_of() of a NULL container
object-file: use `container_of()` to convert from base types
The code to accept shallow "git push" has been optimized.
* ps/receive-pack-shallow-optim:
commit: use commit graph in `lookup_commit_reference_gently()`
commit: make `repo_parse_commit_no_graph()` more robust
commit: avoid parsing non-commits in `lookup_commit_reference_gently()`
A couple of bugs in use of flag bits around odb API has been
corrected, and the flag bits reordered.
* ps/object-info-bits-cleanup:
odb: convert `odb_has_object()` flags into an enum
odb: convert object info flags into an enum
odb: drop gaps in object info flag values
builtin/fsck: fix flags passed to `odb_has_object()`
builtin/backfill: fix flags passed to `odb_has_object()`
Additional tests were introduced to see the interaction with netrc
auth with auth failure on the http transport.
* ag/http-netrc-tests:
t5550: add netrc tests for http 401/403
"git subtree split --prefix=P <commit>" now checks the prefix P
against the tree of the (potentially quite different from the
current working tree) given commit.
* ps/validate-prefix-in-subtree-split:
subtree: validate --prefix against commit in split
A signature on a commit that was GPG signed long time ago ought to
be still valid after the key that was used to sign it has expired,
but we showed them in alarming red.
* uk/signature-is-good-after-key-expires:
gpg-interface: signatures by expired keys are fine
Revamp object enumeration API around odb.
* ps/odb-for-each-object:
odb: drop unused `for_each_{loose,packed}_object()` functions
reachable: convert to use `odb_for_each_object()`
builtin/pack-objects: use `packfile_store_for_each_object()`
odb: introduce mtime fields for object info requests
treewide: drop uses of `for_each_{loose,packed}_object()`
treewide: enumerate promisor objects via `odb_for_each_object()`
builtin/fsck: refactor to use `odb_for_each_object()`
odb: introduce `odb_for_each_object()`
packfile: introduce function to iterate through objects
packfile: extract function to iterate through objects of a store
object-file: introduce function to iterate through objects
object-file: extract function to read object info from path
odb: fix flags parameter to be unsigned
odb: rename `FOR_EACH_OBJECT_*` flags
Exit early if the hooks do not exist, to avoid spinning up/down
sideband async threads which no-op.
It is important to call the hook_exists() API provided by hook.[ch]
because it covers both config-defined hooks and the "traditional"
hooks from the hookdir. find_hook() only covers the hookdir hooks.
The regression happened because the no-op async threads add some
additional overhead which can be measured with the receive-refs test
of the benchmarks suite [1].
Reproduced using:
cd benchmarks/receive-refs && \
./run --revisions /path/to/git \
fc148b146ad41be71a7852c4867f0773cbfe1ff9~,fc148b146ad41be71a7852c4867f0773cbfe1ff9 \
--parameter-list refformat reftable --parameter-list refcount 10000
1: https://gitlab.com/gitlab-org/data-access/git/benchmarks
Fixes: fc148b146a ("receive-pack: convert update hooks to new API")
Reported-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
[jc: avoid duplicated hardcoded hook names]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of a tree object usually corresponds with the number of entries
it has. While iterating through objects in the repository for
git-repo-structure, identify the tree with the most entries and display
it in the output.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Complex merge events may produce an octopus merge where the resulting
merge commit has more than two parents. While iterating through objects
in the repository for git-repo-structure, identify the commit with the
most parents and display it in the output.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) does not show the corresponding
OIDs for the largest objects in its "table" output. Update the output to
include a list of OID annotations with an index to the corresponding row
in the table.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) shows the total inflated and disk
sizes of reachable objects in the repository, but doesn't show the size
of the largest individual objects. Since an individual object may be a
large contributor to the overall repository size, it is useful for users
to know the maximum size of individual objects.
While interating across objects, record the size and OID of the largest
objects encountered for each object type to provide as output. Note that
the default "table" output format only displays size information and not
the corresponding OID. In a subsequent commit, the table format is
updated to add table annotations that mention the OID.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The machine-parsable formats for the git-repo(1) "structure" subcommand
print output in keyvalue pairs. Introduce the helper function
`print_keyvalue()` to remove some code duplication and improve
readability.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When walking reachable objects in the repository, `count_objects()`
processes a set of objects and updates the `struct object_stats`. In
preparation for more granular statistics being collected, update the
`struct object_stats` for each individual object instead.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the comments of lib-unicode-nfc-nfd.sh, "that that" was used
unintentionally. Remove the redundant "that" to improve clarity.
Signed-off-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace old-style path assertions with modern helpers that
provide clearer diagnostic messages on failure. When test -f
fails, the output gives no indication of what went wrong.
These instances were found using:
git grep "test -[efd]" t/
as suggested in the microproject ideas.
Signed-off-by: Francesco Paparatto <francescopaparatto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have started to see the following assert happen in our GitLab CI
pipelines for jobs that use Windows with Meson:
assertion "bc_ctl.arg_max >= LINE_MAX" failed: file "xargs.c", line 512, function: main
The assert in question verifies that we have enough room available to
pass at least `LINE_MAX` many bytes via the command line. The xargs(1)
binary in those jobs comes from Git for Windows, which in turn sources
the binaries from MSYS2, and has the following limits in place:
$ & "C:/Program Files/Git/usr/bin/bash.exe" -l -c 'xargs --show-limits </dev/null'
Your environment variables take up 17373 bytes
POSIX upper limit on argument length (this system): 12579
POSIX smallest allowable upper limit on argument length (all systems): 4096
Maximum length of command we could actually use: 18446744073709546822
Size of command buffer we are actually using: 12579
Maximum parallelism (--max-procs must be no greater): 2147483647
What's interesting to see is the limit of 16 exabits for the maximum
command line length. This value might seem a bit high, and it is indeed
the result of an underflow: our environment is larger than the POSIX
upper limit on argument length, and the value is computed by subtracting
the former from the latter. So what we get is the result of `2^64 -
(17373 - 12579)`.
This makes it clear that the problem here is the size of our environment
variables. A listing sorted by length yields the following result:
$ Get-ChildItem "Env:" |
Sort-Object { $_.Value.Length } -Descending |
Select-Object Name, @{Name="Length"; Expression={$_.Value.Length}}
Name Length
---- ------
GITLAB_FEATURES 6386
Path 706
PSModulePath 229
The GITLAB_FEATURES environment variable makes up for roughly a third of
the complete environment. This variable is a comma-separated list of
features available for the GitLab instance, and seemingly it has been
growing over time as GitLab added more and more features.
Fix the issue by unsetting the environment variable in "ci/lib.sh". This
ensures that the environment variables are now smaller than the upper
limit on argument length again, and that in turn fixes the assert in
xargs(1).
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We cannot split single words like what we did in the previous
commit. That is because the doc translations are processed in
bigger chunks.
Instead write the two paragraphs with the only variations being this
configuration variable.
Reported-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For SMTP servers that do "mutual certificate verification", the mail
client is required to present its own TLS certificate as well. This
patch adds --smtp-ssl-client-cert and --smtp-ssl-client-key for such
servers.
The problem of which private key for the certificate is chosen arises
when there are private keys in both the certificate and private key
file. According to the documentation of IO::Socket::SSL(link supplied),
the behaviour(the private key chosen) depends on the format of the
certificate. In a nutshell,
- PKCS12: the key in the cert always takes the precedence
- PEM: if the key file is not given, it will "try" to read one
from the cert PEM file
Many users may find this discrepancy unintuitive.
In terms of client certificate, git-send-email is implemented in a way
that what's possible with perl's SSL library is exposed to the user as
much as possible. In this instance, the user may choose to use a PEM
file that contains both certificate and private key should be
at their discretion despite the implications.
Link: https://metacpan.org/pod/IO::Socket::SSL#SSL_cert_file-%7C-SSL_cert-%7C-SSL_key_file-%7C-SSL_key
Link: https://lore.kernel.org/all/319bf98c-52df-4bf9-b157-e4bc2bf087d6@dev.snart.me/
Signed-off-by: David Timber <dxdt@dev.snart.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git diff --find-object=<oid>" is run outside a git repository,
the option parsing callback eagerly resolves the OID via
repo_get_oid(), which reaches get_main_ref_store() and hits a BUG()
assertion because no repository has been set up.
Check startup_info->have_repository before attempting to resolve the
OID, and return a user-friendly error instead.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The is_work_tree_watched() function in fsmonitor-watchman.sample has
two bugs:
1. Wrong variable in error check: After calling watchman_clock(), the
result is stored in $o, but the code checks $output->{error} instead
of $o->{error}. This means errors from the clock command are silently
ignored.
2. Double output violates protocol: When the retry path triggers (the
directory wasn't initially watched), output_result() is called with
the "/" flag, then launch_watchman() is called recursively which
calls output_result() again. This outputs two clock tokens to stdout,
but git's fsmonitor v2 protocol expects exactly one response.
Fix#1 by checking $o->{error} after watchman_clock().
Fix#2 by removing the recursive launch_watchman() call. The "/"
"everything is dirty" flag already tells git to do a full scan, and
git will call the hook again on the next invocation with a valid clock
token.
With the recursive call removed, the $retry guard is no longer needed
since it only existed to prevent infinite recursion. Remove it.
Apply the same fixes to the test helper scripts in t/t7519/.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a non-ASCII character is detected in the body or subject of the email
the user is prompted with,
Which 8bit encoding should I declare [UTF-8]? foo
After this the input string is validated by the regex, based on the fact
that the charset string will be minimum 4 characters [1]. If the string is
more than 4 letters the email is sent, if not then a second prompt to
confirm is asked to the user,
Are you sure you want to use <foo> [y/N]? y
This relies on a length based regex heuristic check to validate the user
input, and can allow clearly invalid charset names to pass if the input is
greater than 4 characters.
Add a semantic validation of the charset name using the
Encode::find_encoding() which is a bundled module of perl. If the encoding
is not recognized, warn the user and ask for confirmation before proceeding.
After this validation the lenght based validation becomes redundant and also
breaks flow, so change the regex of valid input to any non blank string.
Make the encoding warning logic specific to the 8bit prompt, also add a
unique confirmation prompt which reduces the load on ask(), and improves
maintainability.
Additionally, the wording of the first prompt can confuse the user if not
read properly or under any default assumptions for a yes/no prompt. Change
the wording to make it explicitly clear to the user that the prompt needs a
string input, UTF-8 being the default.
The intended flow is,
Declare which 8bit encoding to use [default: UTF-8]? foobar
'foobar' does not appear to be a valid charset name. Use it anyway [y/N]?
[1]- 852a15d748
Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already check for duplicate short names. Check for and report
duplicate long names and numerical options as well.
Perform the slightly expensive string duplicate check only when showing
the usage to keep the cost of normal invocations low. t0012-help.sh
covers it.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a dark-theme user, I use the Preferences dialog to set colors
for gitk. The only color I cannot change via that dialog is the
link foreground color, which leads to using the default link color
on a dark background that makes it hard to read.
Make the link foreground color also configurable in the Gitk
Preferences dialog's Color tab, so users won't need to dig into
the code/manual to check if it is configurable and can simply set
the color there.
Signed-off-by: Wang Zichong <wangzichong@deepin.org>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
"git format-patch --from=<me>" did not honor the command line
option when writing out the cover letter, which has been corrected.
* mf/format-patch-honor-from-for-cover-letter:
format-patch: fix From header in cover letter
Extend the alias configuration syntax to allow aliases using
characters outside ASCII alphanumeric (plus '-').
* jh/alias-i18n:
completion: fix zsh alias listing for subsection aliases
alias: support non-alphanumeric names via subsection syntax
alias: prepare for subsection aliases
help: use list_aliases() for alias listing
Some tests assumed "iconv" is available without honoring ICONV
prerequisite, which has been corrected.
* ps/tests-wo-iconv-fixes:
t6006: don't use iconv(1) without ICONV prereq
t5550: add ICONV prereq to tests that use "$HTTPD_URL/error"
t4205: improve handling of ICONV prerequisite
t40xx: don't use iconv(1) without ICONV prereq
t: don't set ICONV prereq when iconv(1) is missing