The in-memory source stores its objects in a simple array that we grow as
needed. This has a couple of downsides:
- The object lookup is O(n). This doesn't matter in practice because
we only store a small number of objects.
- We don't have an easy way to iterate over all objects in
lexicographic order.
- We don't have an easy way to compute unique object ID prefixes.
Refactor the code to use an oidtree instead. This is the same data
structure used by our loose object source, and thus it means we get a
bunch of functionality for free.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The oidtree data structure is currently only used to store object IDs,
without any associated data. So consequently, it can only really be used
to track which object IDs exist, and we can use the tree structure to
efficiently operate on OID prefixes.
But there are valid use cases where we want to both:
- Store object IDs in a sorted order.
- Associated arbitrary data with them.
Refactor the oidtree interface so that it allows us to store arbitrary
payloads within the respective nodes. This will be used in the next
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cbtree subsystem allows the user to store arbitrary data in a
prefix-free set of strings. This is used by us to store object IDs in a
way that we can easily iterate through them in lexicograph order, and so
that we can easily perform lookups with shortened object IDs.
In its current form, it is not easily possible to store arbitrary data
with the tree nodes. There are a couple of approaches such a caller
could try to use, but none of them really work:
- One may embed the `struct cb_node` in a custom structure. This does
not work though as `struct cb_node` contains a flex array, and
embedding such a struct in another struct is forbidden.
- One may use a `union` over `struct cb_node` and ones own data type,
which _is_ allowed even if the struct contains a flex array. This
does not work though, as the compiler may align members of the
struct so that the node key would not immediately start where the
flex array starts.
- One may allocate `struct cb_node` such that it has room for both its
key and the custom data. This has the downside though that if the
custom data is itself a pointer to allocated memory, then the leak
checker will not consider the pointer to be alive anymore.
Refactor the cbtree to drop the flex array and instead take in an
explicit offset for where to find the key, which allows the caller to
embed `struct cb_node` is a wrapper struct.
Note that this change has the downside that we now have a bit of padding
in our structure, which grows the size from 60 to 64 bytes on a 64 bit
system. On the other hand though, it allows us to get rid of the memory
copies that we previously had to do to ensure proper alignment. This
seems like a reasonable tradeoff.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `write_object_stream()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `write_object()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `write_object()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `read_object_stream()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `read_object_info()` callback function for the in-memory
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pretend_object()` writes an object into the in-memory
object database source. The effect of this is that the object will now
become readable, but it won't ever be persisted to disk.
Before storing the object, we first verify whether the object already
exists. This is done by calling `odb_has_object()` to check all sources,
followed by `find_cached_object()` to check whether we have already
stored the object in our in-memory source.
This is unnecessary though, as `odb_has_object()` already checks the
in-memory source transitively via:
- `odb_has_object()`
- `odb_read_object_info_extended()`
- `do_oid_object_info_extended()`
- `find_cached_object()`
Drop the explicit call to `find_cached_object()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement the `free()` callback function for the "in-memory" source.
Note that this requires us to define `struct cached_object_entry` in
"odb/source-inmemory.h", as it is accessed in both "odb.c" and
"odb/source-inmemory.c" now. This will be fixed in subsequent commits
though.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Next to our typical object database sources, each object database also
has an implicit source of "cached" objects. These cached objects only
exist in memory and some use cases:
- They contain evergreen objects that we expect to always exist, like
for example the empty tree.
- They can be used to store temporary objects that we don't want to
persist to disk, which is used by git-blame(1) to create a fake
worktree commit.
Overall, their use is somewhat restricted though. For example, we don't
provide the ability to use it as a temporary object database source that
allows the user to write objects, but discard them after Git exists. So
while these cached objects behave almost like a source, they aren't used
as one.
This is about to change over the following commits, where we will turn
cached objects into a new "in-memory" source. This will allow us to use
it exactly the same as any other source by providing the same common
interface as the "files" source.
For now, the in-memory source only hosts the cached objects and doesn't
provide any logic yet. This will change with subsequent commits, where
we move respective functionality into the source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit ec0becacc9 (run-command: add stdin callback for
parallelization, 2026-01-28), we taught run_processes_parallel() to
ignore SIGPIPE, since we wouldn't want a write() to a broken pipe of one
of the children to take down the whole process.
But there's a subtle ordering issue. After we ignore SIGPIPE, we call
pp_init(), which installs its own cleanup handler for multiple signals
using sigchain_push_common(), which includes SIGPIPE. So if we receive
SIGPIPE while writing to a child, we'll trigger that handler first, pop
it off the stack, and then re-raise (which is then ignored because of
the SIG_IGN we pushed first).
But what does that handler do? It tries to clean up all of the child
processes, under the assumption that when we re-raise the signal we'll
be exiting the process!
So a hook that exits without reading all of its input will cause us to
get SIGPIPE, which will put us in a signal handler that then tries to
kill() that same child.
This seems to be mostly harmless on Linux. The process has already
exited by this point, and though kill() does not complain (since the
process has not been reaped with a wait() call), it does not affect the
exit status of the process.
However, this seems not to be true on all platforms. This case is
triggered by t5401.13, "pre-receive hook that forgets to read its
input". This test fails on NonStop since that hook was converted to the
run_processes_parallel() API.
We can fix it by reordering the code a bit. We should run pp_init()
first, and then push our SIG_IGN onto the stack afterwards, so that it
is truly ignored while feeding the sub-processes.
Note that we also reorder the popping at the end of the function, too.
This is not technically necessary, as we are doing two pops either way,
but now the pops will correctly match their pushes.
This also fixes a related case that we can't test yet. If we did have
more than one process to run, then one child causing SIGPIPE would cause
us to kill() all of the children (which might still actually be
running). But the hook API is the only user of the new feed_pipe
feature, and it does not yet support parallel hook execution. So for now
we'll always execute the processes sequentially. Once parallel hook
execution exists, we'll be able to add a test which covers this.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During Git 2.52 timeframe, we broke streaming computation of object
hash outside a repository, which has been corrected.
* jt/index-fd-wo-repo-regression-fix:
During Git 2.52 timeframe, we broke streaming computation of object
hash outside a repository, which has been corrected.
* jt/index-fd-wo-repo-regression-fix-maint:
object-file: avoid ODB transaction when not writing objects
The experimental `git replay` command learned the `--ref=<ref>` option
to allow specifying which ref to update, overriding the default behavior.
* tc/replay-ref:
replay: allow to specify a ref with option --ref
replay: use stuck form in documentation and help message
builtin/replay: mark options as not negatable
add_files_to_cache() used diff_files() to detect only the paths that
are different between the index and the working tree and add them,
which does not need rename detection, which interfered with unnecessary
conflicts.
* ng/add-files-to-cache-wo-rename:
read-cache: disable renames in add_files_to_cache
Update reftable library part with what is used in libgit2 to improve
portability to different target codebases and platforms.
* ps/reftable-portability:
reftable/system: add abstraction to mmap files
reftable/system: add abstraction to retrieve time in milliseconds
reftable/fsck: use REFTABLE_UNUSED instead of UNUSED
reftable/stack: provide fsync(3p) via system header
reftable: introduce "reftable-system.h" header
Various code clean-up around odb subsystem.
* ps/odb-cleanup:
odb: drop unneeded headers and forward decls
odb: rename `odb_has_object()` flags
odb: use enum for `odb_write_object` flags
odb: rename `odb_write_object()` flags
treewide: use enum for `odb_for_each_object()` flags
CodingGuidelines: document our style for flags
Add the missing &&'s so we properly propagate failures
between commands in the hook helper functions.
Also add a missing mkdir -p arg (found by adding the &&).
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ce1661f9da (odb: add transaction interface, 2025-09-16), existing
ODB transaction logic is adapted to create a transaction interface
at the ODB layer. The intent here is for the ODB transaction
interface to eventually provide an object source agnostic means to
manage transactions.
An unintended consequence of this change though is that
`object-file.c:index_fd()` may enter the ODB transaction path even
when no object write is requested. In non-repository contexts, this
can result in a NULL dereference and segfault. One such case occurs
when running git-diff(1) outside of a repository with
"core.bigFileThreshold" forcing the streaming path in `index_fd()`:
$ echo foo >foo
$ echo bar >bar
$ git -c core.bigFileThreshold=1 diff -- foo bar
In this scenario, the caller only needs to compute the object ID. Object
hashing does not require an ODB, so starting a transaction is both
unnecessary and invalid.
Fix the bug by avoiding the use of ODB transactions in `index_fd()` when
callers are only interested in computing the object hash.
Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
[jc: adjusted to fd13909e (Merge branch 'jt/odb-transaction', 2025-10-02)]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git backfill" is capable of auto-detecting a sparsely checked out
working tree, which was broken.
* th/backfill-auto-detect-sparseness-fix:
backfill: auto-detect sparse-checkout from config
The check in "receive-pack" to prevent a checked out branch from
getting updated via updateInstead mechanism has been corrected.
* ps/receive-pack-updateinstead-in-worktree:
receive-pack: use worktree HEAD for updateInstead
t5516: clean up cloned and new-wt in denyCurrentBranch and worktrees test
t5516: test updateInstead with worktree and unborn bare HEAD
Handling of signed commits and tags in fast-import has been made more
configurable.
* jt/fast-import-signed-modes:
fast-import: add 'abort-if-invalid' mode to '--signed-tags=<mode>'
fast-import: add 'sign-if-invalid' mode to '--signed-tags=<mode>'
fast-import: add 'strip-if-invalid' mode to '--signed-tags=<mode>'
fast-import: add 'abort-if-invalid' mode to '--signed-commits=<mode>'
fast-export: check for unsupported signing modes earlier
The way the "git log -L<range>:<file>" feature is bolted onto the
log/diff machinery is being reworked a bit to make the feature
compatible with more diff options, like -S/G.
* mm/line-log-use-standard-diff-output:
doc: note that -L supports patch formatting and pickaxe options
t4211: add tests for -L with standard diff options
line-log: route -L output through the standard diff pipeline
line-log: fix crash when combined with pickaxe options
Reduce dependency on `the_repository` in add-patch.c file.
* sp/add-patch-with-fewer-the-repository:
add-patch: use repository instance from add_i_state instead of the_repository
Internals of "git fsck" have been refactored to not depend on the
global `the_repository` variable.
* ps/fsck-wo-the-repository:
builtin/fsck: stop using `the_repository` in error reporting
builtin/fsck: stop using `the_repository` when marking objects
builtin/fsck: stop using `the_repository` when checking packed objects
builtin/fsck: stop using `the_repository` with loose objects
builtin/fsck: stop using `the_repository` when checking reflogs
builtin/fsck: stop using `the_repository` when checking refs
builtin/fsck: stop using `the_repository` when snapshotting refs
builtin/fsck: fix trivial dependence on `the_repository`
fsck: drop USE_THE_REPOSITORY
fsck: store repository in fsck options
fsck: initialize fsck options via a function
fetch-pack: move fsck options into function scope
The value of a wrong pointer variable was referenced in an error
message that reported that it shouldn't be NULL.
* yc/path-walk-fix-error-reporting:
path-walk: fix NULL pointer dereference in error message
Fix a regression in writing the commit-graph where commits with dates
exceeding 34 bits (beyond year 2514) could cause an underflow and
crash Git during the generation data overflow chunk writing.
* ps/commit-graph-overflow-fix:
commit-graph: fix writing generations with dates exceeding 34 bits
A handful of inappropriate uses of the_repository have been
rewritten to use the right repository structure instance in the
read-cache.c codepath.
* jd/read-cache-trace-wo-the-repository:
read-cache: use istate->repo for trace2 logging
Adjust the codebase for C23 that changes functions like strchr()
that discarded constness when they return a pointer into a const
string to preserve constness.
* jk/c23-const-preserving-fixes:
config: store allocated string in non-const pointer
rev-parse: avoid writing to const string for parent marks
revision: avoid writing to const string for parent marks
rev-parse: simplify dotdot parsing
revision: make handle_dotdot() interface less confusing
A few code paths that spawned child processes for network
connection weren't wait(2)ing for their children and letting "init"
reap them instead; they have been tightened.
* aa/reap-transport-child-processes:
transport-helper, connect: use clean_on_exit to reap children on abnormal exit
pack-objects's --stdin-packs=follow mode learns to handle
excluded-but-open packs.
* tb/stdin-packs-excluded-but-open:
repack: mark non-MIDX packs above the split as excluded-open
pack-objects: support excluded-open packs with --stdin-packs
t7704: demonstrate failure with once-cruft objects above the geometric split
pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
pack-objects: plug leak in `read_stdin_packs()`
Object name handling (disambiguation and abbreviation) has been
refactored to be backend-generic, moving logic into the respective
object database backends.
* ps/odb-generic-object-name-handling:
odb: introduce generic `odb_find_abbrev_len()`
object-file: move logic to compute packed abbreviation length
object-name: move logic to compute loose abbreviation length
object-name: simplify computing common prefixes
object-name: abbreviate loose object names without `disambiguate_state`
object-name: merge `update_candidates()` and `match_prefix()`
object-name: backend-generic `get_short_oid()`
object-name: backend-generic `repo_collect_ambiguous()`
object-name: extract function to parse object ID prefixes
object-name: move logic to iterate through packed prefixed objects
object-name: move logic to iterate through loose prefixed objects
odb: introduce `struct odb_for_each_object_options`
oidtree: extend iteration to allow for arbitrary return codes
oidtree: modernize the code a bit
object-file: fix sparse 'plain integer as NULL pointer' error
"print" is not a valid argument for --update-refs. List both valid
alternatives literally in the argh string, consistent with documentation
and usage string.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
1edeb9a (Win32: warn if the console font doesn't support Unicode,
2014-06-10) introduced both code to detect the current console font on
Windows Vista and newer and a fallback for older systems to detect the
default console font and issue a warning if that font doesn't support
unicode.
Since we haven't supported any Windows older than Vista in almost a
decade, we don't need to keep the workaround.
Signed-off-by: Matthias Aßhauer <mha1993@live.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>