Allow protocol layers to look up an inpcb belonging to a particular FIB.
This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be
used is obtained from the specificed mbuf or ifnet.
No functional change intended.
Reviewed by: glebius, melifaro
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48662
(cherry picked from commit da806e8db685eead02bc67888b16ebac6badb6b6)
Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB
number. When this flag is specified, duplicate bindings are permitted,
so long as each FIB contains at most one inpcb bound to the same
address/port. If an inpcb is bound with this flag, it'll have the
INP_BOUNDFIB flag set.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48661
(cherry picked from commit bbd0084baf7539c7042ce94f8c6770210f83f765)
This is to enable a mode where duplicate inpcb bindings are permitted,
and we want to look up an inpcb with a particular FIB. Thus, add a
"fib" parameter to in_pcblookup() and related functions, and plumb it
through.
A fib value of RT_ALL_FIBS indicates that the lookup should ignore FIB
numbers when searching. Otherwise, it should refer to a valid FIB
number, and the returned inpcb should belong to the specific FIB. For
now, just add the fib parameter where needed, as there are several
layers to plumb through.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48660
(cherry picked from commit 9a4131629bb3083ddc02a32950e4eb4806a07710)
See commit 4f02a7d739b3 for more background.
I cannot see a good reason to continue ignoring mismatching UIDs when
binding to INADDR_ANY. Looking at the sdr.V2.4a7n sources (mentioned in
bugzilla PR 7713), there is a CANT_MCAST_BIND hack wherein the
application binds to INADDR_ANY instead of a multicast address, but
CANT_MCAST_BIND isn't defined for FreeBSD builds.
It seems unlikely that we still have a use-case for allowing sockets
from different UIDs to bind to the same port when binding to the
unspecified address. And, as noted in D47832, applications like sdr
would have been broken by the inverted SO_REUSEPORT check removed in
that revision, apparently without any bug reports. Let's break
compatibility and simply disallow this case outright.
Also, add some comments, remove a hack in a regression test which tests
this funtionality, and add a new regression test to exercise the
remaining checks that were added in commit 4658dc8325e03.
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47870
(cherry picked from commit c9756953bded0d8428027fa3e812c9bdac069252)
For a long time, the inpcb lookup path has been lockless in the common
case: we use net_epoch to synchronize lookups. However, the routines
which update lbgroups were not careful to synchronize with unlocked
lookups. I believe that in the worst case this can result in spurious
connection aborts (I have a regression test case to exercise this), but
it's hard to be certain.
Modify in_pcblbgroup* routines to synchronize with unlocked lookup:
- When removing inpcbs from an lbgroup, do not shrink the array.
The maximum number of lbgroup entries is INPCBLBGROUP_SIZMAX (256),
and it doesn't seem worth the complexity to shrink the array when a
socket is removed.
- When resizing an lbgroup, do not insert it into the hash table until
it is fully initialized; otherwise lookups may observe a partially
constructed lbgroup.
- When adding an inpcb to the group, increment the counter after adding
the array entry, using a release store. Otherwise it's possible for
lookups to observe a null array slot.
- When looking up an entry, use a corresponding acquire load.
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48020
(cherry picked from commit a600aabe9b04f0906069a8fb1f8d696ad186080f)
This check for SO_REUSEPORT was added way back in commit 52b65dbe85faf.
Per the commit log, this commit restricted this port-stealing check to
unicast addresses, and then only if the existing socket does not have
SO_REUSEPORT set. In other words, if there exists a socket bound to
INADDR_ANY, and we bind a socket to INADDR_ANY with the same port, then
the two sockets need not be owned by the same user if the existing
socket has SO_REUSEPORT set.
This is a surprising semantic; bugzilla PR 7713 gives some additional
context. That PR makes a case for the behaviour described above when
binding to a multicast address. But, the SO_REUSEPORT check is only
applied when binding to a non-multicast address, so it doesn't really
make sense. In the PR the committer notes that "unicast applications
don't set SO_REUSEPORT", which makes some sense, but also refers to
"multicast applications that bind to INADDR_ANY", which sounds a bit
suspicious.
OpenBSD performs the multicast check, but not the SO_REUSEPORT check.
DragonflyBSD removed the SO_REUSEPORT (and INADDR_ANY) checks back in
2014 (commit 0323d5fde12a4). NetBSD explicitly copied our logic and
still has it.
The plot thickens: 20 years later, SO_REUSEPORT_LB was ported from
DragonflyBSD: this option provides similar semantics to SO_REUSEPORT,
but for unicast addresses it causes incoming connections/datagrams to be
distributed among all sockets in the group. This commit (1a43cff92a20d)
inverted the check for SO_REUSEPORT while adding one for
SO_REUSEPORT_LB; this appears to have been inadvertent. However:
- apparently no one has noticed that the semantics were changed;
- sockets belonging to different users can now be bound to the same port
so long as they belong to a single lbgroup bound to INADDR_ANY, which
is not correct.
Simply remove the SO_REUSEPORT(_LB) checks, as their original
justification was dubious and their current implementation is wrong; add
some tests.
Reviewed by: glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47832
(cherry picked from commit 4f02a7d739b354eef38e19b25866f64842d69414)
Fixes: 01f8ce83242d ("inpcb: Factor out parts of in6_pcbbind() and in_pcbbind_setup()")
(cherry picked from commit ffb3d384fc1d550a764def2c0cd034ac3a4f0b86)
A large portion of these functions just determines whether the inpcb can
bind to the address/port. This portion has no side effects, so is a
good candidate to move into its own helper function. This patch does
so, making the callers less complicated and reducing indentation.
While moving this code, also make some changes:
- Load socket options (SO_REUSEADDR etc.) only once. There is nothing
preventing another thread from toggling the socket options, so make
this function easier to reason about by avoiding races.
- When checking whether the bind address is an interface address, make a
separate sockaddr rather than temporarily modifying the one passed to
in_pcbbind().
Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47590
(cherry picked from commit 01f8ce83242d7a8e599cf6a78b6277161d79edd4)
- Use the local var "laddr" instead of sin->sin_addr in one block.
- Use in_nullhost() instead of explicit comparisons with INADDR_ANY.
- Combine multiple socket options checks into one.
- Fix indentation.
- Remove some unhelpful comments.
This is in preparation for some simplification and bug-fixing.
No functional change intended.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47451
(cherry picked from commit 45a77bf23fa2f36bf2169f7ba2a33b31f4c35adb)
in_pcblookup_hash_wild_* looks up unconnected inpcbs, so there is no
point in passing the foreign address and port, and indeed those
parameters are not used. So, remove them.
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47385
(cherry picked from commit 21d7ac8c79a34cf3b7205d0c32014ee39f1f28ab)
The function in_localip() was changed to return bool but the comment was
left unchanged.
Fixes: c8ee75f2315e Use network epoch to protect local IPv4 addresses hash
MFC after: 3 days
(cherry picked from commit a5e380e51cdba64a392846a4eeda000f948f42ce)
A FreeBSD specific comment asked people to report to a PR if they see
this. By now we got enough feedback and also left this in a release.
Simply point to the PR so people can check the status but not longer
ask to submit a report to the PR.
Sponsored by: The FreeBSD Foundation
PR: 274382
(cherry picked from commit 4a4eee553307a2e02c6ed4796d575bfce2857049)
First configured PINNED routes should have higher priority.
Fixes: 1da4954c92ea
Differential Revision: https://reviews.freebsd.org/D48650
(cherry picked from commit 361a8395f0b0e6f254fd138798232529679d99f6)
Kyua and ATF speak different naming styles. In this case, the
unprivileged user property can be named with underscore on the Kyua
side, and with a hyphen on the ATF side. Sometimes it is not obvious
which style should be used in which situation. For instance, a test case
may require this configuration property being set using require.config.
Also, a test case may want to read the property using something like
atf_tc_get_config_var(). Which names should be used in these cases?
From the perspective of the original code, it is expected to be this:
require.config unprivileged-user
atf_tc_get_config_var(tc, "unprivileged-user")
But, as long as Kyua is the main interface, its users expect to work
with kyua.conf(5), which says that it must be named as unprivileged_user
(with underscore). As a result, test authors tend to do this instead:
require.config unprivileged_user
atf_tc_get_config_var(tc, "unprivileged_user")
Kyua already has hacks to understand both unprivileged_user and
unprivileged-user coming from require.config. And this patch covers the
missing second part -- make Kyua pass both names back to ATF as two
identical configuration properties named different ways.
Reviewed by: ngie, asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D49039
(cherry picked from commit 51a8eb6410461c94c8e0f2b59e3417cfb5d7da75)
IPv4 packets can be routed via an IPv6 nexthop, so the handling of the
parsed address family is more strict than it needs to be. If we have a
valid header that matches a known peer, then we have no reason to
decline the packet.
Convert it to an assertion that it matches the destination as viewed by
the stack below it, instead. `dst` may be the gateway instead of the
destination in the case of a nexthop, so the `af` assignment must be
switched to use the destination in all cases.
Add a test case that approximates a setup like in the PR and
demonstrates the issue.
PR: 284857
Reviewed by: markj (earlier version), zlei
(cherry picked from commit 2bef0d54f74dad6962ef7d1dfa407e95cb4fb4ad)
While here, remove vnet_if_clone_init() which is therefore an empty function.
MFC after: 1 week
(cherry picked from commit 1ba655149ed0447f93e997a60330d9b962d061f2)
The length of key data is specified via sadb_key_bits field.
Use specified size for buffer allocation and key copying.
Also add a check that the value of sadb_key_bits is not zero,
as explicitly required in RFC 2367.
PR: 241010
Submitted by: jean-francois.hren at stormshield eu
(cherry picked from commit 04207850a9b988d3c04e904cb5783f33da7fe184)
Use correct indent number to dump registered socket options.
It is not currently in use but can be used for debugging.
PR: 283970
(cherry picked from commit b405250c77e6841a8159a4081d4e0f61e49dfbf8)
sin_addr of a `struct sockaddr_in` is stored in network byte order, but
IN_LOOPBACK() and IN_LINKLOCAL() want the host order.
Reviewed by: melifaro, #network
Fixes: 7e5bf68495cc netlink: add netlink support
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D49226
(cherry picked from commit 0e096bb3fcaaf663df372aa4abb986e8d63c6e68)
It is used as a boolean function everywhere.
No functional change intended.
MFC after: 1 week
(cherry picked from commit 69beb162848b15c967d3b45ac56501dbd8b94e91)
With recent change 110113bc086f, a vnet tunable can be initialized when
there is a corresponding kernel environment variable unless it is marked
with the flag CTLFLAG_NOFETCH.
The initialization may happen during early boot(linker preload), at that
time vnet0 has not been created. The hander carp_allow_sysctl() for the
tunable net.inet.carp.allow requires vnet, thus invoking it during early
boot will cause kernel panic.
The tunnable is initialized by vnet sysinit routine ipcarp_sysinit() so
let's just mark it with flag CTLFLAG_NOFETCH.
No functional change intended.
Fixes: 110113bc086f sysctl(9): Enable vnet sysctl variables to be loader tunable
MFC after: 2 week
Differential Revision: https://reviews.freebsd.org/D41525
(cherry picked from commit 242fa308f3c3def32b2e61e0b78c11b3697e4492)
A string loader tunable requires setting the len parameter to a nonzero
value, typically the size of the string, to have the flag CTLFLAG_TUN
work correctly [1] [2].
Without this fix security.mac.{biba,lomac}.trusted_interfaces would
have no effect at all.
[1] 3da1cf1e88f8 Extend the meaning of the CTLFLAG_TUN flag to automatically ...
[2] 6a3287f889b0 Fix regression issue after r267961. Handle special string case ...
Reviewed by: olce, kib
Fixes: af3b2549c4ba Pull in r267961 and r267973 again ...
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D48898
(cherry picked from commit 7d4c0fac8c7db9c5741ba98a8b3ce3c43feb1cf4)
Prior to change [1] this flag is useless but harmless. After the change
plat_name[] will be fetched from kernel environment after invoking the
platform probe function `platform_probe_and_attach()`. The probe function
runs at early boot stage prior to `mi_startup()` thus it is too late and
pointless to set plat_name[] after the probe.
Nathan mentioned that the logic to specify the platform pre-dates the
powerpc64 work, and is from the original pre-FDT Book-E bringup from
like 2008, so it's irrelevant these days. Instead of fixing setting the
sysctl knob hw.platform, let's clean it up now.
[1] 3da1cf1e88f8 Extend the meaning of the CTLFLAG_TUN flag to ...
Discussed with: nwhitehorn
Reviewed by: olce (previous version), jhibbits, #powerpc
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D48897
(cherry picked from commit b61fbbed73ea3bf0c84589b56cca160c46a3739d)
Complete phase two of 3da1cf1e88f8.
In 3da1cf1e88f8, the meaning of the flag CTLFLAG_TUN is extended to
automatically check if there is a kernel environment variable which
shall initialize the SYSCTL during early boot. It works for all SYSCTL
types both statically and dynamically created ones, except for the
SYSCTLs which belong to VNETs.
This change extends the meaning further, to allow it also works for
the SYSCTLs which belong to VNETs. A typical usage is
```
VNET_DEFINE_STATIC(int, foo) = 0;
SYSCTL_INT(_net, OID_AUTO, foo, CTLFLAG_RWTUN | CTLFLAG_VNET,
&VNET_NAME(foo), 0, "Description of the foo loader tunable");
```
Note that the implementation has a limitation. It behaves the same way
as that of non-vnet loader tunables. That is, after the kernel or modules
being initialized, any changes (e.g. via kenv) to kernel environment
variable will not affect the corresponding vnet variable of subsequently
created VNETs. To overcome it, we can use TUNABLE_XXX_FETCH to fetch
the kernel environment variable into those vnet variables during vnet
constructing.
This change will fix the following SYSCTLs those belong to VNETs and
have CTLFLAG_TUN flag:
```
net.add_addr_allfibs
net.bpf.optimize_writers
net.inet.tcp.fastopen.ccache_buckets
net.link.bridge.inherit_mac
net.link.bridge.ipfw_arp
net.link.bridge.log_stp
net.link.bridge.pfil_bridge
net.link.bridge.pfil_local_phys
net.link.bridge.pfil_member
net.link.bridge.pfil_onlyip
net.link.lagg.default_use_flowid
net.link.lagg.default_use_numa
net.link.lagg.default_flowid_shift
net.link.lagg.lacp.debug
net.link.lagg.lacp.default_strict_mode
```
Although the following vnet SYSCTLs have CTLFLAG_TUN flag, theirs
values are re-fetched via TUNABLE_XXX_FETCH, thus are not affected
by this change.
```
net.inet.ip.reass_hashsize
net.inet.tcp.hostcache.cachelimit
net.inet.tcp.hostcache.hashsize
net.inet.tcp.hostcache.bucketlimit
net.inet.tcp.syncache.bucketlimit
net.inet.tcp.syncache.cachelimit
net.inet.tcp.syncache.hashsize
net.key.spdcache.maxentries
net.key.spdcache.threshold
```
In memoriam: hselasky
Discussed with: hselasky, glebius
Fixes: 3da1cf1e88f8 Extend the meaning of the CTLFLAG_TUN flag ...
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D39638
(cherry picked from commit 110113bc086f5df1a9b6547edb1ab0cec698c55c)
This mimics 316a0990f0b74 in IPv6. While this condition is already
checked in some of the ICMP code make it explicit while processing
the packet in icmp6_reflect().
The section 4 in the draft proposal [1] explicitly states that 0.0.0.0,
aka INADDR_ANY, retains its existing special meanings.
[1] https://datatracker.ietf.org/doc/draft-schoen-intarea-unicast-0
Reviewed by: glebius
Fixes: efe58855f3ea IPv4: experimental changes to allow net 0/8, 240/4, part of 127/8
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D49157
(cherry picked from commit f7174eb2b4c45573bb9e836edad2b179a445a88f)