Unified Log Schema

The Netra Apex Log Schema is an automatically generated provider-agnostic logging schema for AIOps.

Note that this Schema is automatically generated by the Netra Agent ingestion process, you do not need to manually create it or convert any log data. This is the reference of what is available after the normalization process.

This standardized data serves as the foundation for the Co-Optimization Agent, an AI-powered system designed to work alongside your existing infrastructure. The agent leverages this schema for:

Offline Analysis: Ingesting logs to a central data store to analyze cost, performance, and quality trade-offs, providing systemic insights and optimization recommendations.
Real-time Middleware (Optional): Acting as an intelligent routing layer to dynamically optimize production traffic based on learned patterns and predefined business objectives.

The Log Object Structure

Each log entry is a self-contained JSON object representing a single, atomic LLM operation. The structure is designed to be comprehensive, normalized, and extensible.

Core Namespaces

These top-level objects provide the essential "who, what, where, and when" for every event.

Namespace	Description
`event_metadata`	Information about the log event itself, ensuring traceability and versioning.
`trace_context`	OpenTelemetry-compliant data for tracing complex, multi-step workflows like RAG or agents.
`identity_context`	Data for attributing requests, managing security, and enabling FinOps controls like chargebacks.
`application_context`	Client-side application data for isolating issues in a microservices architecture.

{
  "event_metadata": {
    "log_schema_version": "3.0.1",
    "event_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
    "timestamp_utc": 1721948400000,
    "ingestion_source": "vllm_wrapper_v1.2"
  },
  "trace_context": {
    "trace_id": "trace-001-rag-session",
    "span_id": "span-002-llm-call",
    "parent_span_id": "span-001-retrieval",
    "span_name": "GenerateFinalAnswer",
    "span_kind": "llm"
  },
  "identity_context": {
    "user_id": "usr_anon_f4b3c2d1",
    "organization_id": "org_12345",
    "anon_api_key_hash": "api_anon_a7e8f9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8",
    "auth_method": "api_key"
  },
  "application_context": {
    "app_name": "customer-support-chatbot",
    "service_name": "report-summarizer",
    "sdk_version": "openai-python-1.8.0",
    "environment": "production",
    "anon_ip_reference": "ip_anon_f4b3c2d1"
  }
}

Request Specification

The request object is an immutable, normalized record of all parameters sent to the model, ensuring perfect reproducibility.

`request.model`

Normalizes model identification across providers. This is crucial for cross-provider analysis.

Field	Type	Description
`provider`	String	The entity hosting the model (e.g., `openai`, `aws_bedrock`, `vllm`).
`family`	String	The general model family (e.g., `gpt-4`, `claude-3`, `llama-3`).
`name`	String	The specific, common name of the model (e.g., `gpt-4o`, `claude-3-opus`).
`version_id`	String	The exact version or snapshot ID for reproducibility.

Example Normalization: An API call to AWS Bedrock with modelId: "anthropic.claude-3-opus-20240229-v1:0" is parsed into:

"model": {
  "provider": "aws_bedrock",
  "family": "claude-3",
  "name": "claude-3-opus",
  "version_id": "20240229-v1:0"
}

`request.prompt`

A flexible container for all input types, from simple text to multimodal requests. For more information on redaction, see Data Redaction.

"prompt": {
  "messages": [
    { "role": "user", "content": "What is the capital of France?" }
  ],
  "system_prompt": "You are a helpful assistant.",
  "multimodal_parts": [
    {
      "part_type": "image",
      "mime_type": "image/jpeg",
      "source_uri": "s3://my-bucket/image.jpg",
      "source_base64_hash": "b3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a7e8f9b0c1d2e3f4a5b6c7d8e9f0a1b2"
    }
  ]
}

`request.generation_config`

A canonical structure for all parameters controlling the generation process.

"generation_config": {
    "temperature": 0.7,
    "top_p": 1.0,
    "max_tokens_to_sample": 1024,
    "stop_sequences": ["\n"],
    "is_streaming": false,
    "seed": 42
}

Response Specification

The response object captures the complete output, including generated content, usage metrics, and backend system metadata.

`response.completion`

Contains the generative output, structured to handle multiple candidates.

Field	Type	Description
`choices`	Array	An array of generated responses, each with an `index` and `finish_reason`.
`finish_reason`	Enum	Why generation stopped (`stop_sequence`, `max_tokens`, `tool_calls`, `safety`).
`message`	Object	The standard `{"role": "assistant", "content": "..."}` object.

`response.usage`

A normalized breakdown of token consumption, crucial for accurate FinOps.

Example Normalization:

Unified Field	OpenAI Source	Anthropic Source	Google Source
`prompt_tokens`	`usage.prompt_tokens`	`usage.input_tokens`	`usageMetadata.promptTokenCount`
`completion_tokens`	`usage.completion_tokens`	`usage.output_tokens`	`usageMetadata.candidatesTokenCount`

"usage": {
  "prompt_tokens": 512,
  "completion_tokens": 256,
  "total_tokens": 768,
  "tool_call_tokens": 64, // If applicable
  "cached_input_tokens": 0
}

Performance Specification

The performance namespace captures all metrics related to speed and infrastructure, bridging the gap between managed APIs and self-hosted servers.

`performance.latency_ms`

A granular breakdown of latency measurements from the client's perspective.

Field	Description
`total_e2e`	Total wall-clock time from request start to final token receipt.
`time_to_first_token`	Time from request start until the first token is received. Critical for user perception.
`time_per_output_token`	Average time to generate each subsequent token after the first.
`queue_duration`	Time spent waiting in a server-side queue (from self-hosted servers).
`prefill_duration`	Time spent processing the input prompt (from self-hosted servers).
`decode_duration`	Time spent actively generating tokens (from self-hosted servers).

`performance.self_hosted_metrics`

A dedicated namespace for detailed metrics available only from self-hosted inference servers like vLLM or TGI, scraped from their /metrics endpoints.

"self_hosted_metrics": {
  "server_type": "vllm",
  "scheduler": {
    "running_requests": 4,
    "waiting_requests": 8,
    "swapped_requests": 2
  },
  "kv_cache": {
    "gpu_usage_percent": 85.5
  },
  "hardware_info": {
    "accelerator_type": "A100-80G",
    "accelerator_count": 8,
    "quantization_type": "awq"
  }
}

FinOps Specification

The finops namespace transforms the log into a detailed, auditable financial record. This data is typically enriched post-request by a processor that has access to pricing information.

"finops": {
  "cost": {
    "total_cost_usd": 0.0125,
    "prompt_cost_usd": 0.0050,
    "completion_cost_usd": 0.0075
  },
  "pricing_info": {
    "provider_rate_id": "claude3-opus-ondemand-jul2025",
    "prompt_token_rate_usd_per_million": 15.00,
    "completion_token_rate_usd_per_million": 75.00
  },
  "attribution": {
    "cost_center_id": "eng-rnd-123",
    "project_id": "proj-q3-chatbot-eval",
    "feature_name": "document_summarization"
  }
}

Full Log Examples

Example 1: Standard API Call to OpenAI

This example shows a typical log from a call to a managed provider. Note the absence of self_hosted_metrics.

{
  "event_metadata": {
    "log_schema_version": "3.0.1",
    "event_id": "c1d2e3f4-g5h6-7890-1234-567890abcdef",
    "timestamp_utc": 1721949000000,
    "ingestion_source": "openai_sdk_v1.8"
  },
  "trace_context": { "trace_id": "chat-session-xyz", "span_id": "span-001", "span_name": "InitialUserQuery" },
  "identity_context": { "user_id": "user_abc", "api_key_hash": "a7e8f9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8" },
  "application_context": { "app_name": "marketing-copy-generator", "environment": "production" },
  "request": {
    "model": { "provider": "openai", "family": "gpt-4", "name": "gpt-4o", "version_id": "2024-05-13" },
    "prompt": { "messages": [{ "role": "user", "content": "Write a tagline for a new coffee shop." }] },
    "generation_config": { "temperature": 0.8, "max_tokens_to_sample": 50 }
  },
  "response": {
    "completion": {
      "choices": [{
        "index": 0,
        "finish_reason": "stop_sequence",
        "message": { "role": "assistant", "content": "Brewing Moments, One Cup at a Time." }
      }]
    },
    "usage": { "prompt_tokens": 15, "completion_tokens": 10, "total_tokens": 25 },
    "system": { "provider_request_id": "req_abc123", "system_fingerprint": "fp_123" }
  },
  "performance": {
    "latency_ms": { "total_e2e": 1200, "time_to_first_token": 450, "time_per_output_token": 75.0 }
  },
  "finops": {
    "cost": { "total_cost_usd": 0.0001, "prompt_cost_usd": 0.000075, "completion_cost_usd": 0.000025 },
    "pricing_info": { "prompt_token_rate_usd_per_million": 5.00, "completion_token_rate_usd_per_million": 15.00 }
  }
}

Example 2: Self-Hosted Llama-3 Call via vLLM

This example highlights the detailed performance metrics captured from a self-hosted inference server, which are critical for infrastructure optimization.

{
  "event_metadata": {
    "log_schema_version": "3.0.1",
    "event_id": "d1e2f3g4-h5i6-7890-1234-abcdef567890",
    "timestamp_utc": 1721950000000,
    "ingestion_source": "custom_vllm_client_v1.0"
  },
  "trace_context": { "trace_id": "batch-job-112", "span_id": "span-987", "span_name": "SummarizeArticle" },
  "identity_context": { "user_id": "internal_system_user" },
  "application_context": { "app_name": "news-feed-processor", "environment": "staging" },
  "request": {
    "model": { "provider": "vllm", "family": "llama-3", "name": "llama-3-70b-instruct" },
    "prompt": { "messages": [{ "role": "user", "content": "Summarize the following article..." }] },
    "generation_config": { "temperature": 0.2, "max_tokens_to_sample": 256 }
  },
  "response": {
    "completion": {
      "choices": [{
        "index": 0,
        "finish_reason": "max_tokens",
        "message": { "role": "assistant", "content": "The article discusses..." }
      }]
    },
    "usage": { "prompt_tokens": 850, "completion_tokens": 256, "total_tokens": 1106 }
  },
  "performance": {
    "latency_ms": {
      "total_e2e": 3500,
      "time_to_first_token": 1500,
      "time_per_output_token": 7.8,
      "queue_duration": 400,
      "prefill_duration": 1050,
      "decode_duration": 2000
    },
    "self_hosted_metrics": {
      "server_type": "vllm",
      "scheduler": { "running_requests": 2, "waiting_requests": 5 },
      "kv_cache": { "gpu_usage_percent": 91.2 },
      "hardware_info": { "accelerator_type": "H100-PCIe", "accelerator_count": 4 }
    }
  },
  "finops": {
    "cost": { "total_cost_usd": 0.0 },
    "pricing_info": {}
  }
}