Supply Format Introduction

"Supply" refers to available resources that complete an AI-task.

A Supply Option is one atomic unit of work, that given a set of inputs, and configuration, it will yield one output. A Supply Set is a collection of Supply Options. Depending on the complexity of your context, you may be able to work with only single Supply Option.

There are two types of supply:

Supply Option (The Blueprint): This is the configuration created by the system and admin user. It acts as a template or a catalog entry, defining a range of possible values and static connection details. It tells the agent what is available to use.
Optimization Agent's Selection (The Execution): This is the specific, concrete set of values the agent chooses from a Supply Option at runtime for a given task. It is the agent's recommendation on how to use the resource in that specific context.

The Hierarchy of Supply Configuration

Supply Options are structured hierarchically, allowing for granular control over how the agent interacts with AI models.

1. Service Level

This is the highest level, defining the "where." It specifies the service provider and the connection point. You might have multiple service-level configurations for the same underlying Agents or models to manage different providers, instances, or for redundancy.

Examples: A specific API endpoint for a third-party service, an IP address for a self-hosted model, or different API keys for different teams to track costs.

2. Model or Agent Level

This is the "what." It specifies the actual AI Agent or model being used and its core capabilities.

Examples: swe-1-agent, gpt-4-turbo, Llama-3-70B-Instruct, parameter ranges, feature supports (e.g. supporting tool use or not).

3. Parameter Level

This is the "how." It defines the adjustable parameters. In a user-defined Supply Option, these can be expressed as ranges or lists of allowed values. The Agent's Selection will contain a single, specific value chosen from that range.

Examples: temperature, max_tokens, quantization_level.

Annotated Configuration Examples

Below are the examples you provided, now with annotations to highlight the concepts above.

Example 1: Third-Party API Model (OpenAI GPT-4)

This configuration defines a connection to a commercial model. Notice how some parameters could be defined as a range for the agent to choose from.

{
  "supply_id": "a1b2c3d4-e5f6-7890-1234-openai-gpt4-turbo",
  "supply_name": "OpenAI GPT-4 Turbo (West-US)",
  // --- SERVICE LEVEL ---
  // Defines the provider and how to connect.
  "provider": "OpenAI",
  "deployment_type": "API",
  "connection_config": {
    "api_endpoint": "[https://api.openai.com/v1/chat/completions](https://api.openai.com/v1/chat/completions)",
    // This is a "meta" concept: it's not a model parameter, but an
    // operational detail for cost tracking or security.
    "api_key_name": "TEAM_ALPHA_OPENAI_KEY"
  },
  // --- MODEL LEVEL ---
  // Defines the specific model and its fixed capabilities.
  "model_config": {
    "model_slug": "gpt-4-turbo",
    "capabilities": {
      "context_window_tokens": 128000,
      "supports_tools": true,
      "supports_structured_output": true
    }
  },
  // --- PARAMETER LEVEL (As a User-Defined Range) ---
  // These are the options the agent can CHOOSE from.
  "parameter_ranges": {
    // The agent can pick any value between 0.2 and 0.8 for a given task.
    "temperature": { "min": 0.2, "max": 0.8 },
    "max_tokens": { "max": 4096 },
    "response_format": { "allowed_values": ["text", { "type": "json_object" }] }
  },
  "cost_model": {
    "input_cost_per_million_tokens": OPEN_AI_STANDARD,
    "output_cost_per_million_tokens": CUSTOMER_NEGOTIATED_2025
  }
}

When the agent uses this supply, its final selection for a specific task might look like this, with concrete values chosen from the ranges above:

{
  "model": "gpt-4-turbo",
  "temperature": 0.5,
  "max_tokens": 2500,
  "response_format": { "type": "json_object" }
}

Example 2: Self-Hosted Model (Llama 3 70B)

This example shows a highly-tuned configuration for a self-hosted model, detailing quantization, performance settings, and infrastructure-level parameters. These are often fixed values in the Supply Option itself, as they are tied to the specific deployment.

{
  "supply_id": "a1b2c3d4-e5f6-7890-1234-llama3-70b-awq",
  "supply_name": "Llama 3 70B - High Throughput (AWQ)",
  // --- SERVICE LEVEL ---
  // Defines the internal provider and endpoint.
  "provider": "SelfHosted",
  "deployment_type": "vLLM",
  "connection_config": {
    "api_endpoint": "[http://10.0.1.55:8000/v1/generate](http://10.0.1.55:8000/v1/generate)"
  },
  // --- MODEL & PARAMETER LEVEL ---
  // For self-hosted models, model and parameter configs are often tightly coupled.
  // These are fixed settings for this specific deployment.
  "model_config": {
    "model_path": "/models/meta-llama/Llama-3-70B-Instruct",
    "quantization": {
      "level": "4-bit",
      "type": "AWQ"
    },
    "context_length_limit": 8192,
    "max_input_tokens": 7000,
    "max_output_tokens": 2048
  },
  // --- SERVICE & PERFORMANCE LEVEL (Meta Concepts) ---
  // These are not model parameters but infrastructure settings that define
  // the performance characteristics of this supply option.
  "performance_config": {
    "batching_strategy": "dynamic",
    "max_batch_size": 128,
    "tensor_parallel_size": 4,
    "kv_cache_config": {
      "type": "paged",
      "block_size": 16,
      "gpu_memory_utilization": 0.90
    },
    "speculative_decoding": {
      "draft_model_path": "/models/meta-llama/Llama-3-8B-Instruct",
      "num_speculative_tokens": 5
    }
  },
  "cost_model": {
    "amortized_hourly_cost_usd": 12.50
  }
}