Files
Joe Bowser ebfa0da43e Bug 2005369 - Collect inference metrics in the static embeddings pipeline. r=thasan,ai-platform-reviewers
StaticEmbeddingsPipeline now emits a `metrics` object on its run
  result with a `runTimestamps` timeline (initializationStart/End,
  runStart/End) plus scalar fields tailored to embeddings:
  inputChars, inputTokens, inferenceTime, tokensPerSecond,
  charsPerSecond.

  The shape mirrors the LlamaCppPipeline change from bug 2005365,
  adapted for embeddings: there is no decode stage or
  time-to-first-token, but character throughput is added since
  embeddings are commonly fed raw user text where character counts
  matter alongside token counts.

  The `metrics` field on EmbeddingResponse was previously typed as
  an array and emitted as []; no downstream consumer read it. That
  contract is replaced by the new EmbeddingMetrics interface in
  StaticEmbeddingsPipeline.d.ts.

  Adds a regression-guarding assertion in browser_ml_static_embeddings.js
  covering the metrics shape, the four runTimestamps names, and
  that inputChars/inputTokens/inferenceTime have sensible values.

Differential Revision: https://phabricator.services.mozilla.com/D305384
2026-06-10 22:18:30 +00:00
..