Derrick Stolee 97829b5bdd test-tool: add helper for name-hash values
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.

Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values.

Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.

My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                        4.5K
5314.2: number of distinct name-hashes                       4.1K
5314.3: number of distinct full-name-hashes                  4.5K
5314.4: maximum multiplicity of name-hashes                    13
5314.5: maximum multiplicity of fullname-hashes                 1

Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.

In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                       19.6K
5314.2: number of distinct name-hashes                       8.2K
5314.3: number of distinct full-name-hashes                 19.6K
5314.4: maximum multiplicity of name-hashes                   279
5314.5: maximum multiplicity of fullname-hashes                 1

[1] https://github.com/microsoft/fluentui

That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.

In this repository, no collisions occur for the full-name-hash
algorithm.

In a more extreme example, an internal monorepo had a much worse
collision rate:

Test                                              this tree
-----------------------------------------------------------------
5314.1: paths at head                                      221.6K
5314.2: number of distinct name-hashes                      72.0K
5314.3: number of distinct full-name-hashes                221.6K
5314.4: maximum multiplicity of name-hashes                 14.4K
5314.5: maximum multiplicity of fullname-hashes                 2

Even in this repository with many more paths at HEAD, the collision rate
was low and the maximum number of paths being grouped into a single
bucket by the full-path-name algorithm was two.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
2024-10-08 08:55:52 +02:00
2024-10-03 08:57:57 +02:00
2024-10-04 14:21:40 -07:00
2024-09-20 14:40:41 -07:00
2024-09-06 09:31:15 -07:00
2024-09-23 10:35:07 -07:00
2024-06-17 15:55:55 -07:00
2024-07-08 14:53:10 -07:00
2024-07-08 14:53:10 -07:00
2024-09-23 10:35:07 -07:00
2024-09-03 09:15:00 -07:00
2024-09-16 10:46:00 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-08-23 09:02:33 -07:00
2024-10-07 18:42:24 +02:00
2024-10-07 18:42:24 +02:00
2024-09-19 13:46:00 -07:00
2024-06-14 10:26:33 -07:00
2024-10-03 08:44:43 +02:00
2024-08-08 09:36:53 -07:00
2024-09-19 13:46:00 -07:00
2024-09-19 13:46:00 -07:00
2024-10-06 15:56:06 -07:00
2024-10-03 08:57:32 +02:00
2024-09-27 08:25:36 -07:00
2024-09-23 10:35:09 -07:00
2024-10-02 22:33:58 -04:00
2024-06-14 10:26:33 -07:00
2024-08-09 08:47:34 -07:00
2024-09-06 10:38:49 -07:00
2024-07-08 14:53:10 -07:00
2024-07-08 14:53:10 -07:00
2024-07-08 14:53:10 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-07-25 09:03:00 -07:00
2024-10-03 08:16:31 +02:00
2024-09-25 10:37:12 -07:00
2024-08-09 08:47:34 -07:00
2024-09-19 13:46:00 -07:00
2024-09-19 13:46:01 -07:00
2024-09-25 10:37:11 -07:00
2024-09-25 10:37:12 -07:00
2024-09-16 15:19:05 -07:00
2024-06-14 10:26:33 -07:00
2024-06-14 10:26:33 -07:00
2024-09-19 13:46:12 -07:00
2024-08-28 10:31:26 -07:00
2024-06-14 10:26:33 -07:00
2024-10-02 07:46:26 -07:00
2024-06-14 10:26:33 -07:00
2024-08-09 08:47:34 -07:00
2024-06-24 16:39:15 -07:00
2024-09-04 08:03:24 -07:00
2024-06-14 10:26:33 -07:00

Build status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).

Those wishing to help with error message, usage and informational message string translations (localization l10) should see po/README.md (a po file is a Portable Object file that holds the translations).

To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks
Description
A fork of Git containing Windows-specific patches.
Readme 440 MiB
2025-08-19 03:50:05 -05:00
Languages
C 51.7%
Shell 37.5%
Perl 4.3%
Tcl 3%
Python 0.8%
Other 2.5%