Files
git/AGENTS.md
Johannes Schindelin 63f8b952bb fixup! Add an AGENTS.md file to help with AI-assisted debugging/development
AGENTS: document learnings from split-index + fsmonitor investigation

While investigating a CI failure in the `linux-TEST-vars` job caused by
the interaction between the `pt/fsmonitor-linux` and
`hn/git-checkout-m-with-stash` topics in `seen`, several debugging
techniques proved essential and were not previously documented.

The investigation required bisecting the first-parent history of `seen`
while temporarily merging the fsmonitor topic at each step. This
revealed that `GIT_TEST_SPLIT_INDEX=yes` corrupts the bisect
machinery's own index operations unless it is unset before cleanup
checkouts. It also revealed that `fprintf(stderr, ...)` instrumentation
in Git's C code is swallowed by the test framework, making Trace2 the
correct instrumentation approach.

A key insight was that the bug appeared Linux-specific only because
`linux-TEST-vars` is the sole CI job setting `GIT_TEST_SPLIT_INDEX=yes`;
there is no macOS or Windows equivalent. The actual root cause (the
`index.skipHash=true` + split-index interaction producing a null
`base_oid` in the shared index) is platform-independent.

Add four documentation sections capturing these learnings: bisecting
`seen` interactions, reproducing with exact CI variables, verifying CI
platform coverage before concluding platform-specificity, and using
Trace2 for instrumentation inside the test framework.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2026-04-21 18:07:54 +02:00

1110 lines
41 KiB
Markdown

# Git for Windows - Development Guide
## Background
Git for Windows is a fork of upstream Git that provides the necessary
adaptations to make Git work well on Windows. While the primary target is
Windows, the project also maintains working builds on other platforms (Linux,
macOS) because cross-platform builds often catch mistakes that might be missed
when testing only on Windows.
There are downstream projects that build on Git for Windows, such as Microsoft
Git, which adds features for large monorepos hosted on Azure DevOps.
## Overview
This document provides guidance for developing and debugging in
Git for Windows.
## Repository Structure
### Branch Naming Patterns
Based on actual repository usage:
- `main` - The primary development branch
- Feature branches use descriptive topic names, targeting the main branch
## Building and Testing
### Build
```bash
make -j$(nproc)
```
On Windows (in a Git for Windows SDK shell):
```bash
make -j15
```
### Run Specific Tests
```bash
cd t && sh t0001-init.sh # Run normally
cd t && sh t0001-init.sh -v # Verbose
cd t && sh t0001-init.sh -ivx # verbose, trace, fail-fast
```
Some tests are expensive and skipped by default. When a test exits immediately
with "skip all", check the test script header for `test_bool_env GIT_TEST_*`
to find which environment variable enables it.
## Git Source Code Structure
This section provides a bird's eye view of Git's source code layout. For
more details, see "A birds-eye view of Git's source code" in
`Documentation/user-manual.adoc`.
### Key Directories
| Directory | Purpose |
|------------------|----------------------------------------------------|
| `builtin/` | Built-in command implementations (`cmd_<name>()`) |
| `xdiff/` | Low-level diff algorithms (libxdiff) |
| `t/` | Test suite (shell scripts, helpers, libraries) |
| `Documentation/` | Man pages, guides, technical docs (AsciiDoc) |
| `contrib/` | Optional extras, not part of core Git |
| `compat/` | Platform compatibility shims |
| `refs/` | Reference backends (files, reftable) |
| `reftable/` | Reftable format implementation |
### Built-in Commands
Built-in commands are implemented in `builtin/<name>.c` with a function
`cmd_<name>()`. To add a new built-in:
1. Create `builtin/<name>.c` implementing `cmd_<name>()`
2. Add entry to the `commands[]` array in `git.c`:
```c
{ "<name>", cmd_<name>, RUN_SETUP },
```
3. Add to `BUILTIN_OBJS` in `Makefile`
4. Add to `command-list.txt` with appropriate category
5. Run `make check-builtins` to verify consistency
### Object Data Model
Git stores four types of objects, defined in `object.h`:
```c
enum object_type {
OBJ_COMMIT = 1, /* Points to tree, has parent commits, metadata */
OBJ_TREE = 2, /* Directory listing: names -> blob/tree OIDs */
OBJ_BLOB = 3, /* File contents */
OBJ_TAG = 4, /* Annotated tag pointing to another object */
};
```
Objects are addressed by their SHA (OID) and stored in the Object Database.
### Object Database (ODB)
The ODB is defined in `odb.h` and implemented in `odb.c`:
- **`struct object_database`**: Top-level container, owned by a repository
- `sources`: Linked list of `odb_source` (primary + alternates)
- `replace_map`: Object replacements (see `git-replace(1)`)
- `commit_graph`: Commit-graph cache for faster traversal
- **`struct odb_source`**: A single object store location
- `path`: Directory (e.g., `.git/objects` or an alternate)
- `loose`: Loose object cache
- `packfiles`: Packfile store (idx + pack files)
Key functions:
- `odb_read_object()`: Read an object by OID
- `odb_write_object()`: Write an object, returns OID
- `odb_read_object_info()`: Get object type/size without reading content
### Documentation
Documentation lives in `Documentation/` as AsciiDoc (`.adoc`) files:
- `git-<cmd>.adoc` - Man pages for commands
- `config/<name>.adoc` - Config option documentation (included by others)
- `technical/` - Technical specifications and internals
To build documentation:
```bash
make -C Documentation html # Build HTML docs
make -C Documentation man # Build man pages
```
To add documentation for a new config option, add it to the appropriate
file in `Documentation/config/`. These are included by other docs.
To lint documentation:
```bash
make -C Documentation lint-docs
```
## Debugging Techniques
### Debugging Philosophy
Debugging is not about guessing fixes and seeing if they work. It is about
building a complete understanding of the problem before attempting any fix.
The goal is not speed to a "fix" but confidence that you understand and have
addressed the root cause.
**Respect turnaround time.** If seeing the result of an attempted fix takes
7-10 minutes (e.g., a CI workflow run), you cannot afford to guess. Each
iteration costs human time and attention. Before pushing any change:
1. Ask: "What information am I missing to competently assess this situation?"
2. Add diagnostic output that will provide that information if the fix fails.
3. Consider whether you can reproduce the issue locally where turnaround is
seconds, not minutes.
**Understand before acting.** Before attempting any fix:
1. When investigating a regression between two versions, start by examining
the code diff. Analyze what actually changed before running any tests.
Tests confirm hypotheses; reading the diff gives you the hypothesis.
2. Trace the code flow completely. Read the relevant Makefiles, scripts, and
source files. Understand what each component does and how they interact.
3. Identify all changes that could have contributed: upstream commits,
downstream patches, infrastructure changes (CI runner updates, dependency
upgrades).
4. For each potential cause, find the specific commit, its date, its intent,
and how it interacts with other components.
5. Build a hypothesis. Then ask: "How would I confirm or disprove this?"
**Do not assume root cause from symptoms.** A symptom appearing on one
platform does not mean the bug is platform-specific. The cause may be in
shared code that manifests differently across platforms. Similarly, a passing
test on one platform when it fails on another is data to investigate, not
grounds to conclude "works for me."
**When a fix does not work, investigate why.** If you expected a fix to work
and it did not, that is valuable information. Do not abandon that line of
thinking and try something else. Instead:
1. Ask: "Why didn't that work? What does this tell me about my understanding?"
2. Add more targeted diagnostics to understand the discrepancy.
3. Re-examine your assumptions. Something you believed to be true is false.
**Add diagnostics proactively.** Before pushing a fix attempt, add diagnostic
output that will:
1. Confirm the state you expect to see if the fix works.
2. Reveal the actual state if it does not.
3. Provide enough context to understand the next step without another round
trip.
For build failures, this might include: library paths, compiler flags,
architecture information, symbol tables, file existence checks, environment
variables.
**Build confidence before pushing.** A fix should not be a guess. You should
be able to explain:
1. What was the root cause?
2. Why does this fix address it?
3. What other ways could this problem be solved?
4. Am I choosing the "most correct" or "most effective" approach?
5. What evidence confirms your understanding?
6. What could still go wrong, and how would you detect it?
### Searching the Codebase
In particular when debugging failures that printed error messages, it is often
a useful thing to search for those error messages; If parts of the message seem
mutable (e.g. commit OIDs), those will not be hard-coded and the search needs
to accommodate for that by using regular expressions or prefix matches.
Use `git grep` for fast code searches:
```bash
git grep -n -i "pattern" # Case-insensitive search with line numbers
git grep -n -w "word" # Whole-word matches only
git grep -n -i "pattern" -- "*.c" # Search only C files
```
### Trace2
Enable tracing to see command execution patterns:
```bash
GIT_TRACE2_EVENT=/path/to/trace.txt git <command>
```
### Instrumenting Git Internals During Tests
When adding debug output to Git's C code during test investigation,
`fprintf(stderr, ...)` from git subprocesses spawned by the test framework
is typically swallowed (redirected or discarded by the test harness). Use
Trace2 instead:
```c
trace2_data_intmax("index", NULL, "my_debug/cache_nr", istate->cache_nr);
trace2_data_string("index", NULL, "my_debug/state", some_string);
```
Then run the test with `GIT_TRACE2_EVENT` or `GIT_TRACE2_PERF` pointing to
a file, and grep the output. This integrates with Git's existing tracing
infrastructure and survives the test framework's output management.
As a last resort (e.g. when Trace2 is not initialized yet at the point you
need to instrument), write to a fixed file path:
```c
FILE *f = fopen("/tmp/debug.log", "a");
if (f) { fprintf(f, "state: %u\n", value); fclose(f); }
```
### Comparing Branches After Rebase
```bash
# See what patches exist in a new branch but not old
git log --oneline old-branch..new-branch
# or
git range-diff -s --right-only old-branch...new-branch
# Compare specific files between branches
git diff old-branch..new-branch -- path/to/file.c
# or
git log -p old-branch..new-branch -- path/to/file.c
# or even
git log -L start-line,end-line:path/to/file.c old-branch..new-branch --
# Find upstream changes between tags
git log --oneline --first-parent v2.52.0..v2.53.0
```
### Test Failure Investigation
1. **Reproduce with tracing**: Run test with `-ivx` flags
2. **Check timestamps**: Look at `t_abs` in trace to understand ordering
3. **Compare with working version**: Build and test the previous version
4. **Bisect if needed**: Use `git bisect` to find the breaking commit
Bisecting failures introduced by upstream commits require some stunts to
apply the downstream changes for every bisection step. This can be done by
squashing all downstream changes into one throw-away commit and then
cherry-picking that (typically, there will be merge conflicts the farther
away from the original branch point the commit is cherry-picked to, so it
often makes sense to squash both old and new downstream changes, and then
to "interpolate" between them when encountering merge conflicts).
### Bisecting Failures in `seen`
When a topic passes on its own but fails after being merged to `seen`, the
failure is caused by interaction with another in-flight topic. To identify
the culprit:
1. Fetch the exact `seen` commit from the failing CI run (get the SHA from
the workflow run metadata via the GitHub API).
2. Use a worktree checked out at that `seen` commit.
3. Bisect the first-parent history between `upstream/master` and `seen~1`
(excluding the topic's own merge). At each bisection step, merge the
topic in temporarily, build, run the test, then undo the merge.
4. Write a `git bisect run` script that automates this. Key pitfalls:
- The script must `unset` test environment variables (especially
`GIT_TEST_SPLIT_INDEX`) before cleanup operations like
`git checkout -f`, otherwise the worktree's own index can get
corrupted.
- Use `git checkout -f "$ORIG"` (not `git reset --hard`) to undo the
temporary merge, since `reset --hard` under split-index can corrupt.
- Save the current commit OID at the start (`ORIG=$(git rev-parse HEAD)`)
because `ORIG_HEAD` is unreliable during bisect.
- On merge conflict, return 125 (skip) and `git merge --abort`.
5. Store the alias for running with the full set of CI test variables as a
repository-local alias (to avoid repeating the long export list and to
allow the user to approve the tool call once).
### CI/Workflow Failure Investigation
When a CI workflow fails, the debugging process has a high cost per iteration.
Approach these failures methodically:
**1. Establish what changed.** Before looking at the error, identify:
- What was the last successful run? What version/commit was it based on?
- What changed between then and now? (upstream commits, downstream patches,
runner image updates, dependency changes)
- Use the GitHub API to retrieve run metadata and compare.
**2. Analyze the error deeply.** Read the full error message and surrounding
context. Understand:
- What command failed?
- What were its inputs (flags, environment, paths)?
- What did it expect vs. what did it get?
**3. Trace the code flow locally.** Before making any CI changes:
- Read the workflow YAML, Makefiles, and scripts involved.
- Understand how variables flow from one to another.
- Identify where the failing values come from.
**4. Reproduce locally if possible.** Many CI failures can be reproduced
locally with faster turnaround:
- For build failures: replicate the build environment and commands.
- For macOS issues: if you lack a Mac, at least trace the Makefile logic
to understand what flags should be set and why.
- For test failures that only appear in specific CI jobs (like
`linux-TEST-vars`): reproduce with the _exact_ set of environment
variables that job sets. Check `ci/run-build-and-tests.sh` for the
job's variable block. Do not assume a single variable (e.g.
`GIT_TEST_SPLIT_INDEX`) is sufficient; other variables may contribute
to the failure path.
- When a test fails in `seen` but not on the topic branch alone, check
out the exact `seen` commit from the failing CI run (get the SHA from
the workflow run metadata) and reproduce against that. The interaction
with other in-flight topics is the likely cause.
**5. Do not assume CI coverage from platform support.** When asking "why
does platform X not see this bug?", verify whether CI actually tests that
combination on that platform. For example, `GIT_TEST_SPLIT_INDEX=yes` is
only set by `linux-TEST-vars`; there is no equivalent `osx-TEST-vars` or
`windows-TEST-vars` job. A bug that only manifests under split-index
testing may be present on all platforms but only caught on Linux.
**5. Add comprehensive diagnostics on first attempt.** If you must push to
CI to test, make that push count:
- Add diagnostic output for every hypothesis you have.
- Print the values of key variables, paths, flags.
- Show the state before and after key operations.
- Design diagnostics to distinguish between your hypotheses.
**6. Do not remove diagnostics until the problem is solved.** Keep them in
"drop!" commits so they can be easily removed later but provide information
if subsequent fixes also fail.
**7. When a fix fails, treat it as data.** The failure tells you something.
Your mental model was wrong. Figure out what before trying again.
## Git Workflow
This repository is a shared development environment, not a sandbox. Exercise
caution with all Git operations.
### Committing Changes
Never use `git add -A` or `git add .` - these commands will stage untracked
build artifacts, editor swap files, and other detritus that should not be
committed. Always specify pathspecs explicitly:
```bash
# Good: stage and commit specific files
git commit -sm "your message here" path/to/file.c other/file.h
# Bad: stages everything, including untracked garbage
git add -A && git commit -m "message"
```
The `-s` flag adds a Signed-off-by trailer, which is required for this
project.
When AI assistance is used to author or co-author a commit, add a
Co-authored-by trailer identifying the model:
```bash
git commit -s --trailer "Co-authored-by: <model-name>" -m "message" file.c
```
### Pushing Changes
Never push without explicit user permission. The user controls when and
where changes are pushed. This is especially critical because:
- The repository has multiple remotes with different purposes
- Force-pushing to the wrong remote can cause significant damage
- Tags require special handling (`git push --tags` or explicit tag pushes)
Wait for the user to push, or ask explicitly before pushing.
### Making Code Changes
**Minimal, surgical changes.** Make the smallest possible change to achieve
the goal. Do not rewrite entire files or functions when a targeted edit
suffices. When removing functionality:
1. Remove the code paths that invoke the unwanted functionality
2. Compile to identify what is now unused
3. Remove the unused functions one at a time
4. Repeat until clean
**No fly-by changes.** Do not make changes that were not requested, even if
they seem like improvements (renaming variables, reformatting untouched code,
"fixing" things not part of the task). If you believe a change would be
beneficial but it was not requested, ask for permission first.
**The human is the driver.** Execute what is asked. If you think something
should be done differently, ask---do not just do it.
### Commit Message Quality
Good commit messages use flowing English prose, not bullet points. They
clearly state:
- **Context**: What situation prompted this change? Include URLs to failing
CI runs, issue numbers, or other references that future readers will need.
- **Intent**: What is this change trying to accomplish?
- **Justification**: Why is this the right approach? What alternatives were
considered? When choosing between approaches based on performance,
include measured timings so future readers understand the tradeoffs.
- **Implementation**: How does the change work? (Only for non-obvious parts;
don't describe what's clear from the diff.)
Include exact error messages rather than vague descriptions. If a build
failed with `Undefined symbols for architecture arm64: "_iconv"`, put that
in the commit message - don't just say "fixed a linker error."
Wrap commit messages at 76 columns per line.
### Commit Prefixes for Rebase Workflows
This repository uses interactive rebase with autosquash. Commit prefixes
signal intent:
- **`fixup! <original title>`**: Will be squashed into the referenced commit
during rebase. The title after `fixup!` must match the original commit's
title exactly.
- **`drop!`**: Indicates a commit that should be dropped before the final
merge. Used for debugging, temporary workarounds, or experiments.
To find the correct title for a fixup commit:
```bash
git log --oneline path/to/changed/file | head -10
```
Then use the exact title:
```bash
git commit -sm "fixup! release: add Mac OSX installer build" path/to/file
```
## Rebasing Workflow
Rebases are the bread and butter of Git for Windows: topic branches are
rebased every time upstream Git releases a new version. This section covers
the workflow for managing downstream patches through repeated rebases.
### Merging-Rebases
Git for Windows uses "merging-rebases" to maintain downstream patches. Unlike
a flat series of commits, the downstream changes are organized as topic
branches merged together, preserving the logical grouping of related changes.
Each integration branch (`main`, `shears/next`, `shears/seen`) contains a
marker commit with the message "Start the merging-rebase to \<version\>". This
commit separates upstream history from downstream patches. Reference it with:
```bash
# Find the marker commit
git log --oneline --grep="Start the merging-rebase" -1
# Reference it using commit message search syntax
origin/main^{/Start.the.merging-rebase}
```
When working with merging-rebases:
- **Downstream patches start after the marker**: Use
`origin/main^{/Start.the.merging-rebase}..origin/main` to see all
downstream commits
- **Topic branches are merged, not rebased flat**: Each logical feature or
fix is a branch merged into the integration branch
- **Merge commits are preserved**: The rebase recreates the merge structure
on top of the new upstream base
To compare downstream patches before and after a rebase:
```bash
# Compare the old and new downstream patch series
git range-diff \
old-base^{/Start.the.merging-rebase}..old-branch \
new-base^{/Start.the.merging-rebase}..new-branch
```
### Starting a Merging-Rebase
To rebase the downstream patches onto a new upstream version, create a marker
commit and use it as the base for an interactive rebase:
```bash
# Variables for the commit message
tag=v2.53.0
# The previous marker - this becomes the exclusion point for --onto
previousMergeOid=$(git rev-parse origin/main^{/Start.the.merging-rebase})
tagOid=$(git rev-parse "$tag")
tipOid=$(git rev-parse origin/main)
# Create the marker commit with two parents: the tag and the current tip
markerOid=$(git commit-tree "$tag^{tree}" -p "$tag" -p "$tipOid" -m "Start the merging-rebase to $tag
This commit starts the rebase of $previousMergeOid to $tagOid")
# Graft the marker to appear as if it has only the tag as parent
git replace --graft "$markerOid" "$tag"
# Use the marker as the base for rebasing (only commits after previousMergeOid)
git rebase -r --onto "$markerOid" "$previousMergeOid" origin/main
# After the rebase completes, delete the replace ref
git replace -d "$markerOid"
```
The marker commit is created with two parents: the upstream tag and the
current branch tip. The `git replace --graft` makes Git see only the tag as
parent during the rebase, allowing the downstream commits to be cleanly
rebased onto the new upstream. After the rebase completes, the replace ref
is deleted to clean up.
#### The shears/* Branches
Upstream Git has four integration branches: `seen`, `next`, `master`, and
`maint`. Git for Windows maintains a corresponding `shears/*` branch for each
(`shears/seen`, `shears/next`, `shears/master`, `shears/maint`) that
continuously rebases Git for Windows' `main` onto the respective upstream
branch.
These branches are updated incrementally rather than from scratch, avoiding
re-resolution of merge conflicts. The update process leverages reachability:
1. **Integrate new downstream commits**: If `origin/main` has commits not yet
in the shears branch, rebase them on top (using `-r` to preserve branch
structure). Update the marker commit's message and second parent.
2. **Integrate new upstream commits**: If the upstream branch has commits not
yet integrated, rebase onto the new upstream tip. Update the marker commit
accordingly.
The marker commit's second parent always points to the current `origin/main`
tip, making it trivial to identify what downstream commits are included.
Similarly, the marker's first parent (the upstream base) shows exactly which
upstream version is integrated.
### When to Skip a Patch
Use `git rebase --skip` when the patch is already in the new base:
- **Upstreamed**: The patch was accepted upstream and is now in `seen`
- **Backported**: A fix we backported is now included in the upstream base
- **Superseded**: HEAD already contains evolved code that includes this
change
Signs to skip rather than resolve: HEAD has the functionality, the
conflict would discard the patch entirely, or `git range-diff` shows
the downstream and upstream patches are equivalent.
To find the corresponding upstream commit for a conflicting patch:
```bash
git range-diff --left-only REBASE_HEAD^! REBASE_HEAD..
```
### Resolving Merge Conflicts
When resolving merge conflicts during a rebase (especially when squashing
fixups), the goal is to **apply the minimal surgical change** that the
patch intended, not to reconstruct entire functions or add duplicate code.
#### 1. Understand What the Patch Wants
First, examine the patch being applied:
```bash
git show REBASE_HEAD
```
Look at the actual changes (lines starting with `-` and `+`):
- What lines are being removed?
- What lines are being added?
- What is the context (function name, nearby code)?
**Key insight**: The patch shows the *intent*---a specific small change to
make. Focus on this, not on the conflict markers' content.
**Code movement detection**: If the patch shows large changes, check with
`--ignore-space-change`:
```bash
git show <conflicted-commit> --ignore-space-change
```
This reveals whether the commit is primarily **moving code** (lots of
whitespace changes) or making **logic changes** (actual code modifications).
When code was moved and re-indented, focus only on the non-whitespace
changes when resolving the conflict.
#### 2. Understand Where the Code Is Now
The conflict occurred because the code moved or changed since the patch was
created. Find where that code actually exists now:
```bash
# If the patch was changing a specific pattern, find all occurrences
git grep -n "pattern from patch"
# View the conflicted file around those locations
```
**Common mistake**: Assuming the conflict markers show you what to do. They
do not---they just show where Git got confused.
#### 3. Apply the Surgical Change
Make **only** the change the patch intended, but in the current location:
- If the patch adds `--abbrev=12` to a range-diff call, find where that
range-diff call is NOW and add it there
- If the patch changes a `.split()` pattern, find where that pattern is NOW
and change it
- Do not copy entire functions from the conflict markers
- Do not create duplicates
#### 4. Remove ALL Conflict Markers
Conflict markers make the file invalid code:
```
<<<<<<< HEAD
=======
>>>>>>> commit-hash
```
**All three types of markers must be completely removed.**
#### 5. Verify the Resolution
**Critical**: After staging your resolution, verify it matches the patch
intent:
```bash
# Compare your staged changes to the original patch
git diff --cached
git rebase --show-current-patch
# Or more directly, compare to REBASE_HEAD
git diff --cached
git show REBASE_HEAD
# For code that was moved/re-indented, ignore whitespace
git diff --cached --ignore-space-change
git show REBASE_HEAD --ignore-space-change
```
**Verify, verify, verify**: The output of `git diff --cached` should
correspond closely to the diff in `git show REBASE_HEAD`. The line numbers
and context will differ (because code moved), but the actual changes (the
`-` and `+` lines) should match the patch intent.
**After completing a rebase**, always verify the final result:
```bash
# Compare tree before and after rebase
git diff @{1}
# Shows what changed in each rebased commit
git range-diff @{1}...
```
If the rebase was onto the same base commit (e.g., squashing fixups), the
`git diff @{1}` should be empty---this proves the rebase only reorganized
commits without changing the end result. If the rebase was onto a new base
commit (e.g., rebasing onto a new upstream release), the diff should match
the difference between the old and new base commits, modulo any changes
from upstreamed or backported patches. The `git range-diff @{1}...` shows
the intended amendments (like adding `--abbrev=12`) were correctly applied
to each commit.
### Conflict Resolution Red Flags
These indicate you are doing it wrong:
- Your diff adds hundreds of lines when the patch only changed 3
- Conflict markers remain in the file
- Functions appear twice in the file
- You added `<<<<<<< HEAD` or `=======` to the staged changes
- Syntax check fails after resolution
### Key Conflict Resolution Lessons
1. **Context changes, intent does not** - The patch's line numbers are
wrong, but the change is right
2. **Conflict markers lie** - They show you where Git got confused, not
what you should do
3. **One change at a time** - If the patch adds one line, your resolution
should add one line
4. **Verify, verify, verify** - `git diff --cached` should match
`git show REBASE_HEAD` (modulo context)
5. **Post-rebase verification** - `git diff @{1}` (empty) and
`git range-diff @{1}...` (shows amendments)
6. **Ignore whitespace for code moves** - Use `--ignore-space-change` to
see the actual logic changes when code was moved and re-indented
7. **When in doubt, look at the range-diff** - `git range-diff` shows if
you matched the intent
### Useful Rebase Tools
- `git rebase --show-current-patch` - See what change is being applied
- `git show REBASE_HEAD` - Alternative to above, works better with
`--ignore-space-change`
- `git show <commit> --ignore-space-change` - See only logic changes, not
whitespace/indentation
- `git grep -n "pattern"` - Find where code moved to
- `git log -L <start>,<end>:<file> REBASE_HEAD..HEAD` - See how upstream
modified a line range since the original patch; invaluable for
understanding how conflicting lines changed
- `git diff --cached` - After staging resolution, verify it matches
REBASE_HEAD
- `git diff @{1}` - After rebase, compare tree before/after
- `git range-diff @{1}...` - After rebase, verify intended changes were made
- `git range-diff A^! B^!` - Compare original patch to your resolution
### Leveraging Rerere
Git's "reuse recorded resolution" (`rerere`) feature automatically records
how you resolve conflicts and replays those resolutions when the same
conflict recurs. This is invaluable for repeated rebases where the same
downstream patches conflict with similar upstream changes.
When you see `Staged 'file' using previous resolution`, Git has applied a
previously recorded resolution. Always verify these auto-resolutions are
still correct---upstream context may have changed enough that the old
resolution no longer applies cleanly.
To enable rerere:
```bash
git config --global rerere.enabled true
```
### Automation Tips
When running rebases in automated or scripted contexts, disable the pager
to avoid hangs:
```bash
GIT_PAGER=cat git range-diff ...
# or
git --no-pager log ...
```
### Non-interactive "Interactive" Rebases
AI agents cannot drive interactive editors reliably. Instead, insert a
`break` as the first todo command so the rebase stops immediately, then
edit the todo file directly:
```bash
# Start the rebase, stopping before any picks execute
GIT_SEQUENCE_EDITOR='sed -i 1ib' git rebase -ir <base>
# Find and edit the todo file with the view/edit tools
git rev-parse --git-path rebase-merge/git-rebase-todo
# After editing the todo, continue (GIT_EDITOR=true suppresses the
# editor that fixup -C and amend! commands would otherwise open)
GIT_EDITOR=true git rebase --continue
```
### Scripted Hunk Staging
`git add -p` is interactive by default, but its prompts follow a
predictable protocol. To stage the first hunk of a file without
human interaction:
```bash
printf '%s\n' s y q | git add -p <file>
```
The `s` splits a large hunk, `y` stages the first sub-hunk, and `q`
quits. Adjust the sequence for different hunk selections (e.g.,
`y y n q` to stage the first two hunks but skip the third).
### Finding Which Commit to Amend
When a working-tree change belongs in an earlier commit (an `hg absorb`
workflow), use `git log -L` to find which commit last touched the
relevant lines:
```bash
git log -L <start>,+<count>:<file>
```
This shows the full history of a line range, making it easy to identify
the commit whose title you need for a `fixup!` commit. This is far more
surgical than grepping through full diffs.
### Fixup Commits
Downstream patches sometimes require adjustment due to changes in the
environment they operate in. These changes may come from:
- **Upstream code changes**: API modifications, struct field moves,
declarations relocating between headers, or semantic changes in functions
that downstream code depends on.
- **External environment changes**: CI runner image updates, toolchain
upgrades, dependency version changes, or platform behavior shifts.
In both cases, create a `fixup!` commit that will be squashed into the
original downstream patch during the next interactive rebase. The commit
message body must precisely document the change that necessitated the fix:
- For upstream changes: reference the specific upstream commit (by OID or
title) and explain what it changed.
- For external changes: include URLs to failing CI runs, document what
changed in the environment (e.g., "GitHub Actions macos-latest runner
upgraded from macOS 14 to macOS 15"), and note the exact error message.
This documentation is essential because the fixup will be squashed away,
and the context will be lost if not recorded in the commit message that
gets squashed into.
Run affected tests before finalizing.
### Common Adaptation Patterns
**Struct field moves**: When upstream moves fields between structs, update
all downstream code that accesses those fields.
**API changes**: When upstream changes function signatures, update callers
and verify semantics are preserved.
**New abstractions**: When upstream introduces new layers, ensure downstream
code uses the correct instance.
## Coding Conventions
The Git project maintains a charmingly old-school, Unix-greybeard aesthetic
when it comes to text encoding. In the spirit of the PDP-11 and Bell Labs
terminal sessions of yore:
- **ASCII only**: Avoid Unicode characters in source code, comments, and
documentation. Use `->` instead of ``, `--` instead of ``, and so on.
To verify your changes contain no non-ASCII characters:
```
git diff | LC_ALL=C grep '[^ -~]'
```
- **80 columns per line**: The mailing list veterans will "kindly" remind you
that lines should not exceed 80 characters (they do mean columns, but
let's not split beards or hairs about wide glyphs).
First, check for whitespace errors (trailing whitespace, mid-line tabs, etc.):
```
git diff --check
```
Once that passes, you know tabs only appear at line beginnings, so each
tab equals exactly 8 columns. To find lines exceeding 80 columns:
```
git diff --no-color | grep '^+' | sed 's/\t/ /g' | grep '.\{82\}'
```
(We use 82 because diff output prefixes added lines with `+`.)
- **Tabs for indentation**: The codebase uses tabs, not spaces.
- **No trailing whitespace**: Clean up your lines.
**Pre-commit checklist.** Run all three checks before every commit:
```bash
git diff --check &&
git diff --no-color | LC_ALL=C grep '[^ -~]' &&
echo "ERROR: non-ASCII characters found" &&
git diff --no-color | grep '^+' | sed 's/\t/ /g' |
grep '.\{82\}' &&
echo "ERROR: lines exceed 80 columns"
```
The first command catches whitespace errors. If either of the latter
two produces output, fix the offending lines before committing. Note
that these checks apply to commit messages as well (wrap at 76 columns
for messages, 80 for code).
See `Documentation/CodingGuidelines` for the full set of conventions.
### strbuf patterns
Use `strbuf_addf()` with string continuation for multi-line content instead
of multiple `strbuf_addstr()` calls:
```c
/* Good */
strbuf_addf(&buf,
"tree %s\n"
"author %s\n"
"committer %s\n"
"\ncommit message\n",
tree_hex, author, committer);
/* Avoid */
strbuf_addstr(&buf, "tree ");
strbuf_addstr(&buf, tree_hex);
strbuf_addstr(&buf, "\nauthor ");
/* ... */
```
Choose descriptive variable names (`header` for pack headers, not generic
`buf`; use `buf` for the secondary strbuf if you cannot reuse the first).
## Platform Considerations
### Windows-specific issues
On Windows, `unsigned long` is 32 bits even on 64-bit systems. Use `size_t`
for sizes that may exceed 4GB. Be careful with format strings: use `PRIuMAX`
with a cast for `size_t` values.
## Contributing to Upstream Git via GitGitGadget
### Overview
The upstream Git project accepts contributions via the mailing list
(`git@vger.kernel.org`). [GitGitGadget](https://gitgitgadget.github.io/)
bridges GitHub PRs to the mailing list: you push a branch to your GitHub
fork, open a PR against https://github.com/gitgitgadget/git, and
GitGitGadget formats and sends the patches.
### Workflow
1. Push the topic branch to your personal fork on GitHub (the remote
that points at `https://github.com/<you>/git`).
2. Open a PR from `<you>:<branch>` against `gitgitgadget/git`'s `master`.
3. The PR title becomes the patch series subject; the PR body becomes the
cover letter. Use
`gh pr create --repo gitgitgadget/git --head <you>:<branch>`.
4. Use `/submit` as a PR comment to send patches to the mailing list.
5. After review feedback, update the branch, force-push, and `/submit` again.
### Branch Naming
Do **not** use an initials prefix (like `ds/` or `js/`). That convention is
used by the Git maintainer when picking up topics, not by contributors. Use
descriptive names like `tests-explicit-bare-repo`.
### Cover Letter Style
The PR body is the cover letter. It should be plain text (not Markdown with
headers or bullet formatting), since it will be sent as email. Structure:
- A brief subject line (the PR title, e.g. "tests: access bare repositories
explicitly")
- Motivation: why is this change needed?
- Summary: what does the series do? What patterns/techniques does it use?
- Scope: is this part of a larger effort? If so, link to the tracking PR.
Keep it factual and measured. Avoid framing changes in terms of security
when contributing to upstream Git; frame them as robustness, correctness,
or preparation for future defaults.
### Commit Message Conventions (Upstream Git)
Upstream Git commit messages follow stricter conventions than the Microsoft
Git fork:
- **Subject line**: `<area>: <description>` (lowercase after the colon).
The `<area>` is typically a file name without extension (e.g. `t0001`,
`setup`, `scalar`) or a subsystem name (e.g. `tests`, `refs`).
- **Body**: Flowing English prose, no bullet points. Wrap at 76 columns.
- **ASCII only**: No Unicode characters anywhere in the message.
- **Trailers**: `Signed-off-by` is mandatory. `Assisted-by` for AI.
- The subject line must accurately describe the diff content. If a commit
adds `--git-dir=.` to one invocation, do not title it "wrap bare repo
commands in subshell with `GIT_DIR`".
### Patch Series with Dependencies
When contributing a branch thicket (multiple related patch series with
dependencies), submit the foundation series first and note the overall
effort in the cover letter with a link to the tracking PR or `compare`
URL. Submit dependent series after earlier ones land in `seen`.
Use `git replay --onto <target> <base>..<branch>` to test whether a
sub-branch applies cleanly to a given base (e.g., `upstream/master` or
`upstream/seen`) without touching the working tree. By default (since
the `--ref-action` default changed to `update`), `git replay` updates
named refs in the range directly, producing no stdout output. Use
`--ref-action=print` to get the old behavior of printing `update-ref`
commands to stdout instead. Always verify that `git replay` actually
did something by checking the reflog of the affected branches.
## Working with Worktrees
### General Principles
Use worktrees to work on multiple topics simultaneously without stashing
or switching branches. Keep worktrees as subdirectories of the main
repository and add them to `.git/info/exclude` so they do not show up
as untracked files.
```bash
git worktree add <name> <branch>
echo "<name>" >> .git/info/exclude
```
### Rewriting Commits with `--update-refs`
When rewriting history in a worktree (e.g., fixing a commit message via
`amend!` + autosquash), use `--update-refs` so that other local branches
pointing into the rewritten range are updated automatically:
```bash
# Create a local branch at the commit to be pushed
git branch <push-name> <tip>
# Create the amend! commit and autosquash
git commit --allow-empty -F <message-file>
GIT_SEQUENCE_EDITOR=true GIT_EDITOR=true \
git rebase -i --autosquash --update-refs <base>
# Verify: tree should be identical
git diff <push-name>@{1}..<push-name>
# Force-push the updated branch
git push <remote> <push-name> --force-with-lease
```
The `--update-refs` flag is essential: without it, only the checked-out
branch is rewritten and other branches become stale, pointing at
pre-rewrite commits.
### Verifying Rebase Results
After any rebase, verify that the tree content is unchanged (unless you
intentionally modified it):
```bash
git diff @{1} # Should be empty for pure rewording
git range-diff @{1}... # Shows per-commit changes
```
## Analyzing Branch Thickets
When a branch is structured as a sequence of merged sub-branches (a
"branch thicket"), use the merge structure to extract sub-branches:
```bash
# List the merge commits (sub-branches)
git log --oneline --first-parent <branch>...upstream/master | grep 'Merge branch'
# Extract commits for a specific sub-branch (second parent of its merge)
git log --oneline <merge>^1..<merge>^2
# Find what each sub-branch forks from
git log -1 --format='%H %s' <first-commit-in-sub-branch>^
```
Use `git replay` to test whether sub-branches can be rebased onto a new
base without conflicts. This replaces speculation about "overlapping files"
with actual evidence:
```bash
git replay --onto upstream/master <old-base>..<branch>
```
If the range contains merge commits, `git replay` will fail with "replaying
merge commits is not supported yet!" In that case, identify the linear
commit range and replay just those commits.
## Resources
- [Git for Windows](https://gitforwindows.org/)
- [Git Internals](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain)
- [GitGitGadget](https://gitgitgadget.github.io/) - Bridge GitHub PRs to
the Git mailing list
- [Git Mailing List Archive](https://lore.kernel.org/git/) - Searchable
archive of all upstream discussion