Unified Log Schema
The Netra Apex Log Schema is an automatically generated provider-agnostic logging schema for AIOps.
Note that this Schema is automatically generated by the Netra Agent ingestion process, you do not need to manually create it or convert any log data. This is the reference of what is available after the normalization process.
This standardized data serves as the foundation for the Co-Optimization Agent, an AI-powered system designed to work alongside your existing infrastructure. The agent leverages this schema for:
- Offline Analysis: Ingesting logs to a central data store to analyze cost, performance, and quality trade-offs, providing systemic insights and optimization recommendations.
- Real-time Middleware (Optional): Acting as an intelligent routing layer to dynamically optimize production traffic based on learned patterns and predefined business objectives.
The Log Object Structure
Each log entry is a self-contained JSON object representing a single, atomic LLM operation. The structure is designed to be comprehensive, normalized, and extensible.
Core Namespaces
These top-level objects provide the essential "who, what, where, and when" for every event.
Namespace | Description |
---|---|
event_metadata | Information about the log event itself, ensuring traceability and versioning. |
trace_context | OpenTelemetry-compliant data for tracing complex, multi-step workflows like RAG or agents. |
identity_context | Data for attributing requests, managing security, and enabling FinOps controls like chargebacks. |
application_context | Client-side application data for isolating issues in a microservices architecture. |
{
"event_metadata": {
"log_schema_version": "3.0.1",
"event_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
"timestamp_utc": 1721948400000,
"ingestion_source": "vllm_wrapper_v1.2"
},
"trace_context": {
"trace_id": "trace-001-rag-session",
"span_id": "span-002-llm-call",
"parent_span_id": "span-001-retrieval",
"span_name": "GenerateFinalAnswer",
"span_kind": "llm"
},
"identity_context": {
"user_id": "usr_anon_f4b3c2d1",
"organization_id": "org_12345",
"anon_api_key_hash": "api_anon_a7e8f9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8",
"auth_method": "api_key"
},
"application_context": {
"app_name": "customer-support-chatbot",
"service_name": "report-summarizer",
"sdk_version": "openai-python-1.8.0",
"environment": "production",
"anon_ip_reference": "ip_anon_f4b3c2d1"
}
}
Request Specification
The request
object is an immutable, normalized record of all parameters sent to the model, ensuring perfect reproducibility.
request.model
request.model
Normalizes model identification across providers. This is crucial for cross-provider analysis.
Field | Type | Description |
---|---|---|
provider | String | The entity hosting the model (e.g., openai , aws_bedrock , vllm ). |
family | String | The general model family (e.g., gpt-4 , claude-3 , llama-3 ). |
name | String | The specific, common name of the model (e.g., gpt-4o , claude-3-opus ). |
version_id | String | The exact version or snapshot ID for reproducibility. |
Example Normalization:
An API call to AWS Bedrock with modelId: "anthropic.claude-3-opus-20240229-v1:0"
is parsed into:
"model": {
"provider": "aws_bedrock",
"family": "claude-3",
"name": "claude-3-opus",
"version_id": "20240229-v1:0"
}
request.prompt
request.prompt
A flexible container for all input types, from simple text to multimodal requests. For more information on redaction, see Data Redaction.
"prompt": {
"messages": [
{ "role": "user", "content": "What is the capital of France?" }
],
"system_prompt": "You are a helpful assistant.",
"multimodal_parts": [
{
"part_type": "image",
"mime_type": "image/jpeg",
"source_uri": "s3://my-bucket/image.jpg",
"source_base64_hash": "b3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a7e8f9b0c1d2e3f4a5b6c7d8e9f0a1b2"
}
]
}
request.generation_config
request.generation_config
A canonical structure for all parameters controlling the generation process.
"generation_config": {
"temperature": 0.7,
"top_p": 1.0,
"max_tokens_to_sample": 1024,
"stop_sequences": ["\n"],
"is_streaming": false,
"seed": 42
}
Response Specification
The response
object captures the complete output, including generated content, usage metrics, and backend system metadata.
response.completion
response.completion
Contains the generative output, structured to handle multiple candidates.
Field | Type | Description |
---|---|---|
choices | Array | An array of generated responses, each with an index and finish_reason . |
finish_reason | Enum | Why generation stopped (stop_sequence , max_tokens , tool_calls , safety ). |
message | Object | The standard {"role": "assistant", "content": "..."} object. |
response.usage
response.usage
A normalized breakdown of token consumption, crucial for accurate FinOps.
Example Normalization:
Unified Field | OpenAI Source | Anthropic Source | Google Source |
---|---|---|---|
prompt_tokens | usage.prompt_tokens | usage.input_tokens | usageMetadata.promptTokenCount |
completion_tokens | usage.completion_tokens | usage.output_tokens | usageMetadata.candidatesTokenCount |
"usage": {
"prompt_tokens": 512,
"completion_tokens": 256,
"total_tokens": 768,
"tool_call_tokens": 64, // If applicable
"cached_input_tokens": 0
}
Performance Specification
The performance
namespace captures all metrics related to speed and infrastructure, bridging the gap between managed APIs and self-hosted servers.
performance.latency_ms
performance.latency_ms
A granular breakdown of latency measurements from the client's perspective.
Field | Description |
---|---|
total_e2e | Total wall-clock time from request start to final token receipt. |
time_to_first_token | Time from request start until the first token is received. Critical for user perception. |
time_per_output_token | Average time to generate each subsequent token after the first. |
queue_duration | Time spent waiting in a server-side queue (from self-hosted servers). |
prefill_duration | Time spent processing the input prompt (from self-hosted servers). |
decode_duration | Time spent actively generating tokens (from self-hosted servers). |
performance.self_hosted_metrics
performance.self_hosted_metrics
A dedicated namespace for detailed metrics available only from self-hosted inference servers like vLLM or TGI, scraped from their /metrics
endpoints.
"self_hosted_metrics": {
"server_type": "vllm",
"scheduler": {
"running_requests": 4,
"waiting_requests": 8,
"swapped_requests": 2
},
"kv_cache": {
"gpu_usage_percent": 85.5
},
"hardware_info": {
"accelerator_type": "A100-80G",
"accelerator_count": 8,
"quantization_type": "awq"
}
}
FinOps Specification
The finops
namespace transforms the log into a detailed, auditable financial record. This data is typically enriched post-request by a processor that has access to pricing information.
"finops": {
"cost": {
"total_cost_usd": 0.0125,
"prompt_cost_usd": 0.0050,
"completion_cost_usd": 0.0075
},
"pricing_info": {
"provider_rate_id": "claude3-opus-ondemand-jul2025",
"prompt_token_rate_usd_per_million": 15.00,
"completion_token_rate_usd_per_million": 75.00
},
"attribution": {
"cost_center_id": "eng-rnd-123",
"project_id": "proj-q3-chatbot-eval",
"feature_name": "document_summarization"
}
}
Full Log Examples
Example 1: Standard API Call to OpenAI
This example shows a typical log from a call to a managed provider. Note the absence of self_hosted_metrics
.
{
"event_metadata": {
"log_schema_version": "3.0.1",
"event_id": "c1d2e3f4-g5h6-7890-1234-567890abcdef",
"timestamp_utc": 1721949000000,
"ingestion_source": "openai_sdk_v1.8"
},
"trace_context": { "trace_id": "chat-session-xyz", "span_id": "span-001", "span_name": "InitialUserQuery" },
"identity_context": { "user_id": "user_abc", "api_key_hash": "a7e8f9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8" },
"application_context": { "app_name": "marketing-copy-generator", "environment": "production" },
"request": {
"model": { "provider": "openai", "family": "gpt-4", "name": "gpt-4o", "version_id": "2024-05-13" },
"prompt": { "messages": [{ "role": "user", "content": "Write a tagline for a new coffee shop." }] },
"generation_config": { "temperature": 0.8, "max_tokens_to_sample": 50 }
},
"response": {
"completion": {
"choices": [{
"index": 0,
"finish_reason": "stop_sequence",
"message": { "role": "assistant", "content": "Brewing Moments, One Cup at a Time." }
}]
},
"usage": { "prompt_tokens": 15, "completion_tokens": 10, "total_tokens": 25 },
"system": { "provider_request_id": "req_abc123", "system_fingerprint": "fp_123" }
},
"performance": {
"latency_ms": { "total_e2e": 1200, "time_to_first_token": 450, "time_per_output_token": 75.0 }
},
"finops": {
"cost": { "total_cost_usd": 0.0001, "prompt_cost_usd": 0.000075, "completion_cost_usd": 0.000025 },
"pricing_info": { "prompt_token_rate_usd_per_million": 5.00, "completion_token_rate_usd_per_million": 15.00 }
}
}
Example 2: Self-Hosted Llama-3 Call via vLLM
This example highlights the detailed performance
metrics captured from a self-hosted inference server, which are critical for infrastructure optimization.
{
"event_metadata": {
"log_schema_version": "3.0.1",
"event_id": "d1e2f3g4-h5i6-7890-1234-abcdef567890",
"timestamp_utc": 1721950000000,
"ingestion_source": "custom_vllm_client_v1.0"
},
"trace_context": { "trace_id": "batch-job-112", "span_id": "span-987", "span_name": "SummarizeArticle" },
"identity_context": { "user_id": "internal_system_user" },
"application_context": { "app_name": "news-feed-processor", "environment": "staging" },
"request": {
"model": { "provider": "vllm", "family": "llama-3", "name": "llama-3-70b-instruct" },
"prompt": { "messages": [{ "role": "user", "content": "Summarize the following article..." }] },
"generation_config": { "temperature": 0.2, "max_tokens_to_sample": 256 }
},
"response": {
"completion": {
"choices": [{
"index": 0,
"finish_reason": "max_tokens",
"message": { "role": "assistant", "content": "The article discusses..." }
}]
},
"usage": { "prompt_tokens": 850, "completion_tokens": 256, "total_tokens": 1106 }
},
"performance": {
"latency_ms": {
"total_e2e": 3500,
"time_to_first_token": 1500,
"time_per_output_token": 7.8,
"queue_duration": 400,
"prefill_duration": 1050,
"decode_duration": 2000
},
"self_hosted_metrics": {
"server_type": "vllm",
"scheduler": { "running_requests": 2, "waiting_requests": 5 },
"kv_cache": { "gpu_usage_percent": 91.2 },
"hardware_info": { "accelerator_type": "H100-PCIe", "accelerator_count": 4 }
}
},
"finops": {
"cost": { "total_cost_usd": 0.0 },
"pricing_info": {}
}
}
Updated 9 days ago