mirror of
https://github.com/microsoft/vscode.git
synced 2026-04-28 18:12:45 -05:00
* add OTel instrumentation spec and plan for all agents * feat: OTel instrumentation for Copilot CLI background agent - Add agentOTelEnv.ts config derivation helpers (CLI + Claude) - Enable SDK OtelLifecycle via env vars before LocalSessionManager ctor - Add invoke_agent copilotcli wrapper span with traceparent propagation - Forward OTel env vars to terminal CLI sessions - Update spec and plan docs for all agents - 33 tests passing (14 new + 19 existing) * feat: filter debug-panel-only spans from OTLP export Spans with non-standard gen_ai.operation.name values (content_event, user_message) are excluded from external OTLP export while remaining visible in the Agent Debug Log panel via onDidCompleteSpan. Only GenAI-conventional operations (invoke_agent, chat, execute_tool, embeddings, execute_hook) are exported to the user's collector. * fix: add IOTelService to CopilotCLISessionService ctor in participant test * fix: pass chatSessionId to CapturingToken for debug panel routing The CapturingToken was created without chatSessionId, so the debug panel couldn't route copilotcli OTel spans to the correct session view. Also: Copilot CLI runtime only supports otlp-http (not gRPC). Terminal CLI sessions require an HTTP-compatible OTLP endpoint. * docs: add CLI HTTP-only limitation to spec and dual-port Aspire setup to test plan * fix: forward OTel env vars to CLI terminal sessions - Include OTel env vars in terminal profile provider path (dropdown) which previously only set shell info without auth/OTel env - Pass empty env to deriveCopilotCliOTelEnv for terminal sessions so vars are always included regardless of process.env pollution from the in-process background agent - Update test plan to use Grafana LGTM stack * fix: add CHAT_SESSION_ID to attributes in CopilotCLISession * docs: update OTel instrumentation specification for Copilot CLI and Claude Code * feat: bridge SDK native OTel spans to Agent Debug panel Replace synthetic span approach (PR #4494) with a bridge SpanProcessor that forwards SDK-native spans from the Copilot CLI runtime's BasicTracerProvider into the extension's IOTelService event stream. This gives the debug panel the full SDK span hierarchy (subagents, permissions, hooks, nested tool calls) — identical to what Grafana shows. Architecture: - Add injectCompletedSpan() to IOTelService interface for external span injection without OTLP re-export - Create CopilotCliBridgeSpanProcessor that converts ReadableSpan to ICompletedSpanData, injects copilot_chat.chat_session_id from a traceId→sessionId map, and fires onDidCompleteSpan - Install bridge on SDK's TracerProvider via internal MultiSpanProcessor._spanProcessors array (OTel SDK v2 removed the public addSpanProcessor API, but this internal array is the same pattern the SDK itself uses in forceFlush) - Propagate traceparent from extension root span to SDK via otelLifecycle.updateParentTraceContext() so all spans share a traceId - Filter bridge to only forward spans from registered CLI sessions Code changes: - copilotCliBridgeSpanProcessor.ts: new bridge processor - copilotcliSession.ts: remove all synthetic spans (chat, tool, error), keep root invoke_agent span + traceparent propagation + bridge wiring - copilotcliSessionService.ts: install bridge after first session creation, wire bridge + SDK trace context updater to sessions - IOTelService: add injectCompletedSpan to interface + all impls - Remove outdated synthetic span tests - Add OTel data flow architecture diagram (HTML) * fix: update span processing to use parent span context and enhance subagent event identification * display names for tool call and subagent events * docs: merge arch and spec into single developer guide Combine agent_monitoring_arch.md (foreground-only) and agent-otel-spec.md (all agents) into a single comprehensive developer reference covering all four agent paths, bridge architecture, and SDK internal access warnings. * docs: fix stale addSpanProcessor reference in data flow diagram * chore: move plan and test docs to offline archive These documents are reference material for the OTel sprint, not needed in the shipped PR. Archived to ~/Documents/copilot-otel-archive/. * test: add bridge SpanProcessor unit tests 13 tests covering: traceId filtering, parentSpanContext conversion, CHAT_SESSION_ID injection, attribute flattening, event conversion, HrTime→ms conversion, unregister/shutdown behavior. * test: add span event identification and naming tests 7 tests covering invoke_agent identification logic: top-level skip, SDK wrapper skip (no agent name), subagent detection (name attribute and span name parsing), unknown/missing operation name handling. * fix: always enable SDK OTel for debug panel regardless of user config The CLI SDK's OtelLifecycle must always initialize so the bridge processor can forward native spans to the debug panel. When user OTel is disabled, COPILOT_OTEL_ENABLED is still set but no OTLP endpoint is configured — the SDK creates spans (for debug panel) but doesn't export to any external collector. The bridge installation is also now unconditional — it installs even when user OTel is disabled. * chore: remove transient sprint plan * fix: suppress SDK OTLP export when user OTel is disabled When user OTel is disabled, force the SDK to use file exporter to /dev/null instead of letting it default to OTLP. Also clear any leftover OTEL_EXPORTER_OTLP_ENDPOINT from previous sessions to prevent orphaned traces in Grafana. * docs: add background agents section to user monitoring guide Cover Copilot CLI (background + terminal) and Claude Code agent tracing in the user-facing guide. Includes span hierarchy examples, service.name filtering table, and CLI HTTP-only limitation note. * docs: remove Claude Code from user guide (not yet supported) * fixup! feat: OTel instrumentation for Copilot CLI background agent * fix: address PR review comments - Use GenAiOperationName constants in EXPORTABLE_OPERATION_NAMES (avoids drift) - Remove unnecessary delete of OTEL_EXPORTER_OTLP_ENDPOINT from process.env - Replace 'as any' OTel mocks with typed NoopOTelService in terminal tests - Clarify comment on empty env arg for terminal OTel env derivation - Add ExportResultCode.SUCCESS comment for clarity * fixup! fix: always enable SDK OTel for debug panel regardless of user config * fix: handle SDK native hook spans in debug panel The SDK's OtelSessionTracker creates 'hook {type}' spans with github.copilot.hook.type attributes (not gen_ai.operation.name). These were silently dropped by completedSpanToDebugEvent. Now detected by span name prefix and converted to Hook: {type} events. * add execute_hook spans for Claude hook executions in monitoring documentation * docs: add hook spans to CLI trace hierarchy in user guide
37 KiB
37 KiB