In add_patterns_from_input(), we may read lines from a file with a loop
like this:
while (!strbuf_getline(&line, file)) {
...
strbuf_to_cone_pattern(&line, pl);
}
/* we don't strbuf_release(&line) here! */
This generally is OK because strbuf_to_cone_pattern() consumes the
buffer via strbuf_detach(). But we can leak in a few cases:
1. We don't always consume the buffer! If the line ends up empty after
trimming, we leave strbuf_to_cone_pattern() without detaching. In
most cases this is OK, because a subsequent getline() call will use
the same buffer. But if you had an empty line at the end of file,
for example, it would leak.
2. Even if strbuf_to_cone_pattern() always consumed the buffer,
there's a subtle issue with strbuf_getline(). As we saw in
94e2aa555e (strbuf: fix leak when `appendwholeline()` fails with
EOF, 2024-05-27), it's possible for it to return EOF with an
allocated buffer (e.g., if the underlying getdelim() call saw an
error). So we should always strbuf_release() after finishing a read
loop like this.
Note that even the code to read patterns from argv has the same problem.
Because that also uses strbuf_to_cone_pattern(), we stuff each argv
entry into a strbuf. It uses the same "line" strbuf as the getline code,
but we should position the strbuf_release() to cover both code paths.
This fixes at least 9 leaks found in t1091.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we read patterns from --stdin, we loop on strbuf_getline(), and
detach each line we read to pass into add_pattern(). This used to be
necessary because add_pattern() required that the pattern strings remain
valid while the pattern_list was in use. But it also created a leak,
since we didn't record the detached buffers anywhere else.
Now that add_pattern() has been modified to make its own copy of the
strings, we can stop detaching and fix the leak. This fixes 4 leaks
detected in t1091.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The add_pattern() function has a subtle and undocumented gotcha: the
pattern string you pass in must remain valid as long as the pattern_list
is in use (and nor do we take ownership of it). This is easy to get
wrong, causing either subtle bugs (because you free or reuse the string
buffer) or leaks (because you copy the string, but don't track ownership
separately).
All of this "pattern" code was originally the "exclude" mechanism. So
this _usually_ works OK because you add entries in one of two ways:
1. From the command-line (e.g., "--exclude"), in which case we're
pointing to an argv entry which remains valid for the lifetime of
the program.
2. From a file (e.g., ".gitignore"), in which case we read the whole
file into a buffer, attach it to the pattern_list's "filebuf"
entry, then parse the buffer in-place (adding NULs). The strings
point into the filebuf, which is cleaned up when the whole
pattern_list goes away.
But other code, like sparse-checkout, reads individual lines from stdin
and passes them one by one to add_pattern(), leaking each. We could fix
this by refactoring it to take in the whole buffer at once, like (2)
above, and stuff it in "filebuf". But given how subtle the interface is,
let's just fix it to always copy the string.
That seems at first like we'd be wasting extra memory, but we can
mitigate that:
a. The path_pattern struct already uses a FLEXPTR, since we sometimes
make a copy (when we see "foo/", we strip off the trailing slash,
requiring a modifiable copy of the string).
Since we'll now always embed the string inside the struct, we can
switch to the regular FLEX_ARRAY pattern, saving us 8 bytes of
pointer. So patterns with a trailing slash and ones under 8 bytes
actually get smaller.
b. Now that we don't need the original string to hang around, we can
get rid of the "filebuf" mechanism entirely, and just free the file
contents after parsing. Since files are the sources we'd expect to
have the largest pattern sets, we should mostly break even on
stuffing the same data into the individual structs.
This patch just adjusts the add_pattern() interface; it doesn't fix any
leaky callers yet.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a2bc523e1e (dir.c: skip .gitignore, etc larger than INT_MAX,
2024-05-31) we put capped the size of some files whose parsing code and
data structures used ints. Setting the limit to INT_MAX was a natural
spot, since we know the parsing code would misbehave above that.
But it also leaves the possibility of overflow errors when we multiply
that limit to allocate memory. For instance, a file consisting only of
"a\na\n..." could have INT_MAX/2 entries. Allocating an array of
pointers for each would need INT_MAX*4 bytes on a 64-bit system, enough
to overflow a 32-bit int.
So let's give ourselves a bit more safety margin by giving a much
smaller limit. The size 100MB is somewhat arbitrary, but is based on the
similar value for attribute files added by 3c50032ff5 (attr: ignore
overly large gitattributes files, 2022-12-01).
There's no particular reason these have to be the same, but the idea is
that they are in the ballpark of "so huge that nobody would care, but
small enough to avoid malicious overflow". So lacking a better guess, it
makes sense to use the same value. The implementation here doesn't share
the same constant, but we could change that later (or even give it a
runtime config knob, though nobody has complained yet about the
attribute limit).
And likewise, let's add a few tests that exercise the limits, based on
the attr ones. In this case, though, we never read .gitignore from the
index; the blob code is exercised only for sparse filters. So we'll
trigger it that way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We call the tips of branches "heads", but this command calls the
option to show only branches "--heads", which confuses the branches
themselves and the tips of branches.
Straighten the terminology by introducing "--branches" option that
limits the output to branches, and deprecate "--heads" option used
that way.
We do not plan to remove "--heads" or "-h" yet; we may want to do so
at Git 3.0, in which case, we may need to start advertising upcoming
removal with an extra warning when they are used.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We call the tips of branches "heads", but this command calls the
option to show only branches "--heads", which confuses the branches
themselves and the tips of branches.
Straighten the terminology by introducing "--branches" option that
limits the output to branches, and deprecate "--heads" option used
that way.
We do not plan to remove "--heads" or "-h" yet; we may want to do so
at Git 3.0, in which case, we may need to start advertising upcoming
removal with an extra warning when they are used.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These things in refs/heads/ hierarchy are called "branches" in human
parlance. Replace REF_HEADS with REF_BRANCHES to make it clearer.
No end-user visible change intended at this step.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
EVen with the minimum "no-op" invocation t1517 makes, "git imap-send"
leaks an empty strbuf it used to read a 0-byte string into.
There are a few other topics cooking in 'next' that plugs many
other leaks in this program, so let's minimally fix this one, barely
enough to make CI pass, leaving the rest for the other topic.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In add_pattern_to_hashsets(), we remove entries from the
recursive_hashmap when adding similar ones to the parent_hashmap. I
won't pretend to understand all of what's going on here, but there's an
obvious leak: whatever we removed from recursive_hashmap is not
referenced anywhere else, and is never free()d.
We can easily fix this by asking the hashmap to return a pointer to the
old entry. This makes t7002 now completely leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In sparse_checkout_init(), we first try to load patterns from an
existing file. If we found any, we return immediately, but end up
leaking the patterns we parsed. Fixing this reduces the number of leaks
in t7002 from 9 down to 5.
Note that there are two other exits from the function, but they don't
need the same treatment:
- if we can't resolve HEAD, we write out a hard-coded sparse file and
return. But we know the pattern list is empty there, since we didn't
find any in the on-disk file and we haven't yet added any of our
own.
- otherwise, we do populate the list and then tail-call into
write_patterns_and_update(). But that function frees the
pattern_list itself, so we don't need to.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pattern_list structs used for cone-mode sparse lookups use a few
extra hashmaps. These store pattern_entry structs, each of which has its
own heap-allocated pattern string. When we clean up the hashmaps, we
free the individual pattern_entry structs, but forget to clean up the
embedded strings, causing memory leaks.
We can fix this by iterating over the hashmaps to free the extra
strings. This reduces the numbers of leaks in t7002 from 22 to 9.
One alternative here would be to make the string a FLEX_ARRAY member of
the pattern_entry. Then there's no extra free() required, and as a bonus
it would be a little more efficient. However, some of the refactoring
gets awkward, as we are often assigning strings allocated by helper
functions. So let's just fix the leak for now, and we can explore bigger
refactoring separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The add_pattern() function takes a pattern string, but neither makes a
copy of it nor takes ownership of the memory. So it is the caller's
responsibility to make sure the string hangs around as long as the
pattern_list which references it.
There are a few cases in sparse-checkout where we use string literal
patterns by stuffing them into a strbuf, detaching the buffer, and then
passing the result into add_pattern(). This creates a leak when the
pattern_list is eventually cleared, since we don't retain a copy of the
detached buffer to free.
But we can observe that the whole strbuf dance is unnecessary. The point
was presumably[1] to satisfy the lifetime requirement of the string. But
string literals have static duration; we can count on them lasting for
the whole program.
So we can fix the leak by just passing them directly. And as a bonus,
that simplifies the code. The leaks can be seen in t7002, which drops
from 25 leaks to 22 with this patch. It also makes t3602 and t1090
leak-free.
In the long run, we will also want to clean up this (undocumented!)
memory lifetime requirement of add_pattern(). But that can come in a
later patch; passing the string literals directly will be the right
thing either way.
[1] The code in question comes from 416adc8711 (sparse-checkout: update
working directory in-process for 'init', 2019-11-21) and 99dfa6f970
(sparse-checkout: use in-process update for disable subcommand,
2019-11-21), but I didn't see anything in their commit messages or
on the list explaining the strbufs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use a string list to hold sorted and de-duped patterns, but don't
free it before leaving the function, causing a leak.
This drops the number of leaks found in t7002 from 27 to 25.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git push" notices that the commit at the tip of the ref on
the other side it is about to overwrite does not exist locally, it
used to first try fetching it if the local repository is a partial
clone. The command has been taught not to do so and immediately
fail instead.
* th/push-local-ff-check-without-lazy-fetch:
push: don't fetch commit object when checking existence
"git init" in an already created directory, when the user
configuration has includeif.onbranch, started to fail recently,
which has been corrected.
* ps/fix-reinit-includeif-onbranch:
setup: fix bug with "includeIf.onbranch" when initializing dir
* ps/leakfixes:
builtin/mv: fix leaks for submodule gitfile paths
builtin/mv: refactor to use `struct strvec`
builtin/mv duplicate string list memory
builtin/mv: refactor `add_slash()` to always return allocated strings
strvec: add functions to replace and remove strings
submodule: fix leaking memory for submodule entries
commit-reach: fix memory leak in `ahead_behind()`
builtin/credential: clear credential before exit
config: plug various memory leaks
config: clarify memory ownership in `git_config_string()`
builtin/log: stop using globals for format config
builtin/log: stop using globals for log config
convert: refactor code to clarify ownership of check_roundtrip_encoding
diff: refactor code to clarify memory ownership of prefixes
config: clarify memory ownership in `git_config_pathname()`
http: refactor code to clarify memory ownership
checkout: clarify memory ownership in `unique_tracking_name()`
strbuf: fix leak when `appendwholeline()` fails with EOF
transport-helper: fix leaking helper name
The GCC v14.1 upgrade broke the build of `git-artifacts`. Partially,
this has been fixed upstream already, but the i686 build requires a
separate fix.
This addresses #4953.
The `__MINGW64__` constant is defined, surprise, surprise, only when
building for a 64-bit CPU architecture.
Therefore using it as a guard to define `_POSIX_C_SOURCE` (so that
`localtime_r()` is declared, among other functions) is not enough, we
also need to check `__MINGW32__`.
Technically, the latter constant is defined even for 64-bit builds. But
let's make things a bit easier to understand by testing for both
constants.
Making it so fixes this compile warning (turned error in GCC v14.1):
archive-zip.c: In function 'dos_time':
archive-zip.c:612:9: error: implicit declaration of function 'localtime_r';
did you mean 'localtime_s'? [-Wimplicit-function-declaration]
612 | localtime_r(&time, &tm);
| ^~~~~~~~~~~
| localtime_s
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* jc/compat-regex-calloc-fix:
compat/regex: fix argument order to calloc(3)
This is a backport of 077c4e1dcc9 (Merge branch
'jc/compat-regex-calloc-fix' into next, 2024-05-13) to fix compile
errors in Git for Windows' SDK since GCC was upgraded to v14.1.
A long time ago, we decided to run tests in Git for Windows' SDK with
the default `winsymlinks` mode: copying instead of linking. This is
still the default mode of MSYS2 to this day.
However, this is not how most users run Git for Windows: As the majority
of Git for Windows' users seem to be on Windows 10 and newer, likely
having enabled Developer Mode (which allows creating symbolic links
without administrator privileges), they will run with symlink support
enabled.
This is the reason why it is crucial to get the fixes for CVE-2024-? to
the users, and also why it is crucial to ensure that the test suite
exercises the related test cases. This commit ensures the latter.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Originally introduced as `core.useBuiltinFSMonitor` in Git for Windows
and developed, improved and stabilized there, the built-in FSMonitor
only made it into upstream Git (after unnecessarily long hemming and
hawing and throwing overly perfectionist style review sticks into the
spokes) as `core.fsmonitor = true`.
In Git for Windows, with this topic branch, we re-introduce the
now-obsolete config setting, with warnings suggesting to existing users
how to switch to the new config setting, with the intention to
ultimately drop the patch at some stage.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch re-adds the deprecated --stdin/-z options to `git
reset`. Those patches were overridden by a different set of options in
the upstream Git project before we could propose `--stdin`.
We offered this in MinGit to applications that wanted a safer way to
pass lots of pathspecs to Git, and these applications will need to be
adjusted.
Instead of `--stdin`, `--pathspec-from-file=-` should be used, and
instead of `-z`, `--pathspec-file-nul`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
A fix for calling `vim` in Windows Terminal caused a regression and was
reverted. We partially un-revert this, to get the fix again.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This patch introduces support to set special NTFS attributes that are
interpreted by the Windows Subsystem for Linux as file mode bits, UID
and GID.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With this patch, Git for Windows works as intended on mounted APFS
volumes (where renaming read-only files would fail).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These are Git for Windows' Git GUI and gitk patches. We will have to
decide at some point what to do about them, but that's a little lower
priority (as Git GUI seems to be unmaintained for the time being, and
the gitk maintainer keeps a very low profile on the Git mailing list,
too).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is the recommended way on GitHub to describe policies revolving around
security issues and about supported versions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git documentation refers to $HOME and $XDG_CONFIG_HOME often, but does not specify how or where these values come from on Windows where neither is set by default. The new documentation reflects the behavior of setup_windows_environment() in compat/mingw.c.
Signed-off-by: Alejandro Barreto <alejandro.barreto@ni.com>
Git for Windows accepts pull requests; Core Git does not. Therefore we
need to adjust the template (because it only matches core Git's
project management style, not ours).
Also: direct Git for Windows enhancements to their contributions page,
space out the text for easy reading, and clarify that the mailing list
is plain text, not HTML.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Getting started contributing to Git can be difficult on a Windows
machine. CONTRIBUTING.md contains a guide to getting started, including
detailed steps for setting up build tools, running tests, and
submitting patches to upstream.
[includes an example by Pratik Karki how to submit v2, v3, v4, etc.]
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
The Git project followed Git for Windows' lead and added their Code of
Conduct, based on the Contributor Covenant v1.4, later updated to v2.0.
We adapt it slightly to Git for Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The Git for Windows project has grown quite complex over the years,
certainly much more complex than during the first years where the
`msysgit.git` repository was abusing Git for package management purposes
and the `git/git` fork was called `4msysgit.git`.
Let's describe the status quo in a thorough way.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>