Stefan Agner 28fa0b35bd Use Unix socket for Supervisor to Core communication (#6590)
* Use Unix socket for Supervisor to Core communication

Switch internal Supervisor-to-Core HTTP and WebSocket communication
from TCP (port 8123) to a Unix domain socket.

The existing /run/supervisor directory on the host (already mounted
at /run/os inside the Supervisor container) is bind-mounted into the
Core container at /run/supervisor. Core receives the socket path via
the SUPERVISOR_CORE_API_SOCKET environment variable, creates the
socket there, and Supervisor connects to it via aiohttp.UnixConnector
at /run/os/core.sock.

Since the Unix socket is only reachable by processes on the same host,
requests arriving over it are implicitly trusted and authenticated as
the existing Supervisor system user. This removes the token round-trip
where Supervisor had to obtain and send Bearer tokens on every Core
API call. WebSocket connections are likewise authenticated implicitly,
skipping the auth_required/auth handshake.

Key design decisions:
- Version-gated by CORE_UNIX_SOCKET_MIN_VERSION so older Core
  versions transparently continue using TCP with token auth
- LANDINGPAGE is explicitly excluded (not a CalVer version)
- Hard-fails with a clear error if the socket file is unexpectedly
  missing when Unix socket communication is expected
- WSClient.connect() for Unix socket (no auth) and
  WSClient.connect_with_auth() for TCP (token auth) separate the
  two connection modes cleanly
- Token refresh always uses the TCP websession since it is inherently
  a TCP/Bearer-auth operation
- Logs which transport (Unix socket vs TCP) is being used on first
  request

Closes #6626
Related Core PR: home-assistant/core#163907

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Close WebSocket on handshake failure and validate auth_required

Ensure the underlying WebSocket connection is closed before raising
when the handshake produces an unexpected message. Also validate that
the first TCP message is auth_required before sending credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix pylint protected-access warnings in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Check running container env before using Unix socket

Split use_unix_socket into two properties to handle the Supervisor
upgrade transition where Core is still running with a container
started by the old Supervisor (without SUPERVISOR_CORE_API_SOCKET):

- supports_unix_socket: version check only, used when creating the
  Core container to decide whether to set the env var
- use_unix_socket: version check + running container env check, used
  for communication decisions

This ensures TCP fallback during the upgrade transition while still
hard-failing if the socket is missing after Supervisor configured
Core to use it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Improve Core API communication logging and error handling

- Remove transport log from make_request that logged before Core
  container was attached, causing misleading connection logs
- Log "Connected to Core via ..." once on first successful API response
  in get_api_state, when the transport is actually known
- Remove explicit socket existence check from session property, let
  aiohttp UnixConnector produce natural connection errors during
  Core startup (same as TCP connection refused)
- Add validation in get_core_state matching get_config pattern
- Restore make_request docstring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Guard Core API requests with container running check

Add is_running() check to make_request and connect_websocket so no
HTTP or WebSocket connection is attempted when the Core container is
not running. This avoids misleading connection attempts during
Supervisor startup before Core is ready.

Also make use_unix_socket raise if container metadata is not available
instead of silently falling back to TCP. This is a defensive check
since is_running() guards should prevent reaching this state.

Add attached property to DockerInterface to expose whether container
metadata has been loaded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Reset Core API connection state on container stop

Listen for Core container STOPPED/FAILED events to reset the
connection state: clear the _core_connected flag so the transport
is logged again on next successful connection, and close any stale
Unix socket session.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Only mount /run/supervisor if we use it

* Fix pytest errors

* Remove redundant is_running check from ingress panel update

The is_running() guard in update_hass_panel is now redundant since
make_request checks is_running() internally. Also mock is_running
in the websession test fixture since tests using it need make_request
to proceed past the container running check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Bind mount /run/supervisor to Supervisor /run/os

Home Assistant OS (as well as the Supervised run scripts) bind mount
/run/supervisor to /run/os in Supervisor. Since we reuse this location
for the communication socket between Supervisor and Core, we need to
also bind mount /run/supervisor to Supervisor /run/os in CI.

* Wrap WebSocket handshake errors in HomeAssistantAPIError

Unexpected exceptions during the WebSocket handshake (KeyError,
ValueError, TypeError from malformed messages) are now wrapped in
HomeAssistantAPIError inside WSClient.connect/connect_with_auth.
This means callers only need to catch HomeAssistantAPIError.

Remove the now-unnecessary except (RuntimeError, ValueError,
TypeError) from proxy _websocket_client and add a proper error
message to the APIError per review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Narrow WebSocket handshake exception handling

Replace broad `except Exception` with specific exception types that
can actually occur during the WebSocket handshake: KeyError (missing
dict keys), ValueError (bad JSON), TypeError (non-text WS message),
aiohttp.ClientError (connection errors), and TimeoutError. This
avoids silently wrapping programming errors into HomeAssistantAPIError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove unused create_mountpoint from MountBindOptions

The field was added but never used. The /run/supervisor host path
is guaranteed to exist since HAOS creates it for the Supervisor
container mount, so auto-creating the mountpoint is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Clear stale access token before raising on final retry

Move token clear before the attempt check in connect_websocket so
the stale token is always discarded, even when raising on the final
attempt. Without this, the next call would reuse the cached bad token
via _ensure_access_token's fast path, wasting a round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add tests for Unix socket communication and Core API

Add tests for the new Unix socket communication path and improve
existing test coverage:

- Version-based supports_unix_socket and env-based use_unix_socket
- api_url/ws_url transport selection
- Connection lifecycle: connected log after restart, ignoring
  unrelated container events
- get_api_state/check_api_state parameterized across versions,
  responses, and error cases
- make_request is_running guard and TCP flow with real token fetch
- connect_websocket for both Unix and TCP (with token verification)
- WSClient.connect/connect_with_auth handshake success, errors,
  cleanup on failure, and close with pending futures

Consolidate existing tests into parameterized form and drop synthetic
tests that covered very little.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 15:09:38 +02:00

Home Assistant Supervisor

First private cloud solution for home automation

Home Assistant (former Hass.io) is a container-based system for managing your Home Assistant Core installation and related applications. The system is controlled via Home Assistant which communicates with the Supervisor. The Supervisor provides an API to manage the installation. This includes changing network settings or installing and updating software.

Installation

Installation instructions can be found at https://home-assistant.io/getting-started.

Development

For small changes and bugfixes you can just follow this, but for significant changes open a RFC first. Development instructions can be found here.

Release

Releases are done in 3 stages (channels) with this structure:

  1. Pull requests are merged to the main branch.
  2. A new build is pushed to the dev stage.
  3. Releases are published.
  4. A new build is pushed to the beta stage.
  5. The stable.json file is updated.
  6. The build that was pushed to beta will now be pushed to stable.

Home Assistant - A project from the Open Home Foundation

2025.09.0 Latest
2025-09-05 03:31:12 -05:00
Languages
Python 95.5%
JavaScript 4.4%