From 44f9fd649673362bdbaae7067a9919b1fe4c96d1 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 24 May 2022 14:54:22 -0400 Subject: [PATCH 1/4] pack-bitmap.c: check preferred pack validity when opening MIDX bitmap When pack-objects adds an entry to its packing list, it marks the packfile and offset containing the object, which we may later use during verbatim reuse (c.f., `write_reused_pack_verbatim()`). If the packfile in question is deleted in the background (e.g., due to a concurrent `git repack`), we'll die() as a result of calling use_pack(), unless we have an open file descriptor on the pack itself. 4c08018204 (pack-objects: protect against disappearing packs, 2011-10-14) worked around this by opening the pack ahead of time before recording it as a valid source for reuse. 4c08018204's treatment meant that we could tolerate disappearing packs, since it ensures we always have an open file descriptor on any pack that we mark as a valid source for reuse. This tightens the race to only happen when we need to close an open pack's file descriptor (c.f., the caller of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, in which case we'll complain that a pack could not be accessed and die(). The pack bitmap code does this, too, since prior to dc1daacdcc (pack-bitmap: check pack validity when opening bitmap, 2021-07-23) it was vulnerable to the same race. The MIDX bitmap code does not do this, and is vulnerable to the same race. Apply the same treatment as dc1daacdcc to the routine responsible for opening the multi-pack bitmap's preferred pack to close this race. This patch handles the "preferred" pack (c.f., the section "multi-pack-index reverse indexes" in Documentation/technical/pack-format.txt) specially, since pack-objects depends on reusing exact chunks of that pack verbatim in reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded, the utility of a bitmap is significantly diminished. Similar to dc1daacdcc, we could technically just add this check in reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX .bitmap without needing to open any of its packs. But it's simpler to do the check as early as possible, covering all direct uses of the preferred pack. Note that doing this check early requires us to call prepare_midx_pack() early, too, so move the relevant part of that loop from load_reverse_index() into open_midx_bitmap_1(). Subsequent patches handle the non-preferred packs in a slightly different fashion. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- pack-bitmap.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index f772d3cb7f..f18a01ed42 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -315,6 +315,8 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, struct stat st; char *idx_name = midx_bitmap_filename(midx); int fd = git_open(idx_name); + uint32_t i; + struct packed_git *preferred; free(idx_name); @@ -353,6 +355,20 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, warning(_("multi-pack bitmap is missing required reverse index")); goto cleanup; } + + for (i = 0; i < bitmap_git->midx->num_packs; i++) { + if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) + die(_("could not open pack %s"), + bitmap_git->midx->pack_names[i]); + } + + preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; + if (!is_pack_valid(preferred)) { + warning(_("preferred pack (%s) is invalid"), + preferred->pack_name); + goto cleanup; + } + return 0; cleanup: @@ -425,8 +441,6 @@ static int load_reverse_index(struct bitmap_index *bitmap_git) * since we will need to make use of them in pack-objects. */ for (i = 0; i < bitmap_git->midx->num_packs; i++) { - if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) - die(_("load_reverse_index: could not open pack")); ret = load_pack_revindex(bitmap_git->midx->packs[i]); if (ret) return ret; From 58a6abb7bae98357677657825631de6652256be9 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 24 May 2022 14:54:27 -0400 Subject: [PATCH 2/4] builtin/pack-objects.c: avoid redundant NULL check Before calling `for_each_object_in_pack()`, the caller `read_packs_list_from_stdin()` loops through each of the `include_packs` and checks that its `->util` pointer (which is used to store the `struct packed_git *` itself) is non-NULL. This check is redundant, because `read_packs_list_from_stdin()` already checks that the included packs are non-NULL earlier on in the same function (and it does not add any new entries in between). Remove this check, since it is not doing anything in the meantime. Co-authored-by: Victoria Dye Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index ba2006f221..f5b098a2d4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3361,8 +3361,6 @@ static void read_packs_list_from_stdin(void) for_each_string_list_item(item, &include_packs) { struct packed_git *p = item->util; - if (!p) - die(_("could not find pack '%s'"), item->string); for_each_object_in_pack(p, add_object_entry_from_pack, &revs, From 5045759de85d40520036c0216dbc51015ef833e7 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 24 May 2022 14:54:31 -0400 Subject: [PATCH 3/4] builtin/pack-objects.c: ensure included `--stdin-packs` exist A subsequent patch will teach `want_object_in_pack()` to set its `*found_pack` and `*found_offset` poitners to NULL when the provided pack does not pass the `is_pack_valid()` check. The `--stdin-packs` mode of `pack-objects` is not quite prepared to handle this. To prepare it for this change, do the following two things: - Ensure provided packs pass the `is_pack_valid()` check when collecting the caller-provided packs into the "included" and "excluded" lists. - Gracefully handle any _invalid_ packs being passed to `want_object_in_pack()`. Calling `is_pack_valid()` early on makes it substantially less likely that we will have to deal with a pack going away, since we'll have an open file descriptor on its contents much earlier. But even packs with open descriptors can become invalid in the future if we (a) hit our open descriptor limit, forcing us to close some open packs, and (b) one of those just-closed packs has gone away in the meantime. `add_object_entry_from_pack()` depends on having a non-NULL `*found_pack`, since it passes that pointer to `packed_object_info()`, meaning that we would SEGV if the pointer became NULL (like we propose to do in `want_object_in_pack()` in the following patch). But avoiding calling `packed_object_info()` entirely is OK, too, since its only purpose is to identify which objects in the included packs are commits, so that they can form the tips of the advisory traversal used to discover the object namehashes. Failing to do this means that at worst we will produce lower-quality deltas, but it does not prevent us from generating the pack as long as we can find a copy of each object from the disappearing pack in some other part of the repository. Co-authored-by: Victoria Dye Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f5b098a2d4..04970e38f6 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3193,10 +3193,8 @@ static int add_object_entry_from_pack(const struct object_id *oid, uint32_t pos, void *_data) { - struct rev_info *revs = _data; - struct object_info oi = OBJECT_INFO_INIT; off_t ofs; - enum object_type type; + enum object_type type = OBJ_NONE; display_progress(progress_state, ++nr_seen); @@ -3207,19 +3205,24 @@ static int add_object_entry_from_pack(const struct object_id *oid, if (!want_object_in_pack(oid, 0, &p, &ofs)) return 0; - oi.typep = &type; - if (packed_object_info(the_repository, p, ofs, &oi) < 0) - die(_("could not get type of object %s in pack %s"), - oid_to_hex(oid), p->pack_name); - else if (type == OBJ_COMMIT) { - /* - * commits in included packs are used as starting points for the - * subsequent revision walk - */ - add_pending_oid(revs, NULL, oid, 0); - } + if (p) { + struct rev_info *revs = _data; + struct object_info oi = OBJECT_INFO_INIT; - stdin_packs_found_nr++; + oi.typep = &type; + if (packed_object_info(the_repository, p, ofs, &oi) < 0) { + die(_("could not get type of object %s in pack %s"), + oid_to_hex(oid), p->pack_name); + } else if (type == OBJ_COMMIT) { + /* + * commits in included packs are used as starting points for the + * subsequent revision walk + */ + add_pending_oid(revs, NULL, oid, 0); + } + + stdin_packs_found_nr++; + } create_object_entry(oid, type, 0, 0, 0, p, ofs); @@ -3338,6 +3341,8 @@ static void read_packs_list_from_stdin(void) struct packed_git *p = item->util; if (!p) die(_("could not find pack '%s'"), item->string); + if (!is_pack_valid(p)) + die(_("packfile %s cannot be accessed"), p->pack_name); } /* From 4090511e408a37091e7812bc1828a763a035e0a2 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Tue, 24 May 2022 14:54:36 -0400 Subject: [PATCH 4/4] builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects When using a multi-pack bitmap, pack-objects will try to perform its traversal using a call to `traverse_bitmap_commit_list()`, which calls `add_object_entry_from_bitmap()` to add each object it finds to its packing list. This path can cause pack-objects to add objects from packs that don't have open pack_fds on them, by avoiding a call to `is_pack_valid()`. This is because we only call `is_pack_valid()` on the preferred pack (in order to do verbatim reuse via `reuse_partial_packfile_from_bitmap()`) and not others when loading a MIDX bitmap. In this case, `add_object_entry_from_bitmap()` will check whether it wants each object entry by calling `want_object_in_pack()`, which will call `want_found_object` (since its caller already supplied a `found_pack`). In most cases (particularly without `--local`, and when `ignored_packed_keep_on_disk` and `ignored_packed_keep_in_core` are both "0"), we'll take the entry from the pack contained in the MIDX bitmap, all without an open pack_fd. When we then try to use that entry later to assemble the actual pack, we'll be susceptible to any simultaneous writers moving that pack out of the way (e.g., due to a concurrent repack) without having an open file descriptor, causing races that result in errors like: remote: Enumerating objects: 1498802, done. remote: fatal: packfile ./objects/pack/pack-e57d433b5a588daa37fbe946e2b28dfaec03a93e.pack cannot be accessed remote: aborting due to possible repository corruption on the remote side. This race can happen even with multi-pack bitmaps, since we may open a MIDX bitmap that is being rewritten long before its packs are actually unlinked. Work around this by calling `is_pack_valid()` from within `want_found_object()`, matching the behavior in `want_object_in_pack_one()` (which has an analogous call). Most calls to `is_pack_valid()` should be basically no-ops, since only the first call requires us to open a file (subsequent calls realize the file is already open, and return immediately). Importantly, when `want_object_in_pack()` is given a non-NULL `*found_pack`, but `want_found_object()` rejects the copy of the object in that pack, we must reset `*found_pack` and `*found_offset` to NULL and 0, respectively. Failing to do so could lead to other checks in `want_object_in_pack()` (such as `want_object_in_pack_one()`) using the same (invalid) pack as `*found_pack`, meaning that we don't call `is_pack_valid()` because `p == *found_pack`. This can lead the caller to believe it can use a copy of an object from an invalid pack. An alternative approach to closing this race would have been to call `is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This has a couple of problems: - it is unnecessarily expensive in the cases where we don't actually need to open any packs (e.g., in `git rev-list --use-bitmap-index --count`) - more importantly, it means any time we would have hit this race, we'll avoid using bitmaps altogether, leading to significant slowdowns by forcing a full object traversal Co-authored-by: Victoria Dye Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- builtin/pack-objects.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 04970e38f6..da21c9b220 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1349,6 +1349,9 @@ static int want_found_object(const struct object_id *oid, int exclude, if (incremental) return 0; + if (!is_pack_valid(p)) + return -1; + /* * When asked to do --local (do not include an object that appears in a * pack we borrow from elsewhere) or --honor-pack-keep (do not include @@ -1464,6 +1467,9 @@ static int want_object_in_pack(const struct object_id *oid, want = want_found_object(oid, exclude, *found_pack); if (want != -1) return want; + + *found_pack = NULL; + *found_offset = 0; } for (m = get_multi_pack_index(the_repository); m; m = m->next) {