Model Context Protocol (MCP) Guide

Introduction

The Netra MCP is designed to integrate with your existing development lifecycle and act as an intelligent partner. Helping you optimize AI-Native systems for cost, latency, quality, throughput, & limiting risks is a continuous challenge.

This document provides a technical overview of the Netra MCP. It focuses on how the MCP provides actionable insights, working alongside your established systems to enhance performance and efficiency.

It is designed to be flexible, supporting both offline analysis for systemic insights and an optional real-time middleware layer for production traffic.

Why It Matters

In 5 minutes and with a single chat you can integrate Netra into your development flow.
Chat with your codebase to get actionable optimization insights, backed by the power of Netra's AI-native platform.
Bring in production insights and multi-user inputs to optimize your system on demand.

MCP Endpoint

https://mcp.netrasystems.ai/v1

Architectural Integration

The Model Context Protocol (MCP) is a secure open standard for AI tool integration. This allows the Netra Agent to interact with your codebase securely without requiring complex, proprietary integrations.

Components

MCP Host: Your AI-native IDE (e.g., Cursor, Windsurf) or desktop application (e.g., Claude Desktop). The Host orchestrates communication between your prompts and the available tools.
Netra MCP Server: A managed, remote server that exposes the MCP's capabilities as a standard set of tools. It acts as a secure gateway to the core Netra platform.
Netra Platform: The backend infrastructure that performs the analysis and generates optimization recommendations.

All interactions are mediated by the MCP Server, ensuring that your development environment never communicates directly with the core platform APIs.

Capabilities Introduction

The Netra MCP's primary function is to execute a "Read-Analyze-Write" loop, turning developer and Agent-to-Agent intent into concrete code and configuration adjustments.

By using your system log data, analyzing your codebase, and multi-user inputs, the agent identifies AI patterns that aren't aligning with your goals. It then identifies Supply Options that can be used to improve the quality of your AI workloads.

Example Prompts:

@mcp:netra-apex: yesterday our costs went up. we need to reduce them but keep quality the same. for the analysis feature you can extend latency by up to 1500ms, but you need to keep the chat latency the same as it is.

@mcp:netra-apex: find the most expensive OpenAI calls and suggest cheaper alternatives with similar quality.

Dynamic Configuration with Middleware

For real-time applications, the agent can refactor code to use the Netra Decision Middleware. This allows for dynamic model selection based on the specific requirements of a request, such as prioritizing latency.

The agent's recommendation can also be used to dynamically set query parameters, allowing for more advanced and context-aware configurations.

See Middleware for more details.

Example: Advanced Configuration with an Orchestration Library

This pattern uses the agent's recommendation to set not only the model but also its runtime parameters.

See Advanced Configuration with an Orchestration Library

System-Wide Tuning

The agent can fetch and apply system-level recommendations based on historical traffic analysis for a specific application, targeting configuration files like launch scripts to apply optimizations.

Getting Started

Add via Chat

Open your IDE's chat window. For example, for Windsurf. Copy and paste this into the Cascade "Write" chat window:

Add to ~/.codeium/windsurf/mcp_config.json:

JSON

{
  "mcpServers": {
    "netra-apex": {
      "serverUrl": "https://mcp.netrasystems.ai/v1",
      "auth": {
        "type": "bearer",
        "token": "${env:NETRA_API_KEY}"
      }
    }
  }
}

To set the environment variable: macOS/Linux: Add export NETRA_API_KEY='your_api_key_here' to your shell profile file (e.g., ~/.zshrc, ~/.bash_profile).

Windows: Use the System Properties dialog to add a new environment variable named NETRA_API_KEY with your API key as the value.

Prerequisites

An active Netra account and API key.
A supported MCP-compliant editor (Cursor, Windsurf, etc.).

Configuration

Connection is established by adding a configuration to your IDE's MCP settings file. The Netra MCP server is hosted at https://mcp.netrasystems.ai/v1.

Example Configuration (~/.cursor/mcp.json):

{
  "mcpServers": {
    "netra": {
      "serverUrl": "https://mcp.netrasystems.ai/v1",
      "auth": {
        "type": "bearer",
        "token": "${env:NETRA_API_KEY}"
      }
    }
  }
}

It is critical to store your API key in an environment variable (NETRA_API_KEY) rather than hardcoding it in the configuration file.

Examples

Example 1: Proactive Cost Management

This example demonstrates how the agent identifies cost-saving opportunities directly from the codebase.

Prompt:

@mcp:netra-apex: find the most expensive OpenAI calls in my codebase and suggest cheaper alternatives that have a similar quality score.

Execution Flow:

Invoke Tool: The IDE's AI Host invokes the Co-Optimization Agent, which executes the netra:findModelCalls tool on the current codebase.
Parse Code: The MCP server receives the request and parses the code to identify all calls to LLM providers, such as openai.chat.completions.create. It extracts the model used (e.g., gpt-4-turbo).
Recommend Supply Options: The server makes an internal API call to the AI Optimization platform's /api/v1/supply_catalog. The returned catalog contains metadata for each Supply Options. The AI Host can then present this as an actionable suggestion with a one-click refactoring option.
Refactor Code: Use the netra:applySystemRecommendations tool to refactor the code to use the recommended Supply Option.

Example 2: Flex Latency to Achieve Cost Goal

This flow shows how the agent refactors code to maximize a latency tolerant research workload using Netra's middleware.

Prompt:

@mcp:netra-apex: yesterday our costs went up. we need to reduce them but keep quality the same. for the analysis deep-research feature you can extend latency by up to 1500ms, but you need to keep the chat latency the same as it is.

Before:

def get_research_response(prompt):
    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7 
    )
    return response.choices.message.content

Execution Flow:

Intent: The agent is invoked on the function and interprets the user's intent.
Refactor Tool: The agent triggers the netra:refactorForMiddleware tool.
Create Utility Weights: Based on the prompt, it assigns high importance to latency, creating a utility_weights object like {"latency": 0.5, "cost": 3, "risk": 1.0}.
Generate Code: The server generates a new Python code snippet that replaces the hardcoded model call with a dynamic, asynchronous call to the Netra client library (netra_client.decide).
Return Refactored Code: The optimized code, which uses the recommended_solution.supply_id from the middleware's response to select the fastest model at runtime, is returned to the AI Host.

This example uses the Netra middleware, however, you can also get just the config for this context without middleware. Also this example shows dynamic params, including for max_tokens, based on the user defined customer_tier.

After:

# Refactored by Netra Agent to use Netra Middleware
async def get_chat_response(prompt):
    # Call Netra to get the best model for this specific request
    decision = await netra_client.decide(
        timeout=100,    # 100 MS timeout to use fallback model
        raw_prompt=prompt,
        metadata={"app_id": "research-feature"},
        utility_weights={"latency": 0.5, "cost": 3, "risk": 1.0}, # Prioritize cost, allow latency to drop.
        customer_tier={name: "premium"}
    )

    # Use the recommended model, or a safe fallback
    if decision and decision.get("recommended_solution"):
        recommended_model = decision["recommended_solution"][0]["model_family"]["model_id"]
        recommended_params = decision["recommended_solution"][0]["API"]["params"]
    else:
        recommended_model = "default-fast-model"
        recommended_params = {"max_tokens": 100}

    # Execute the call with the dynamically selected model & params
    response = client.chat.completions.create(
        model=recommended_model,
        messages=[{"role": "user", "content": prompt}],
        **recommended_params
    )
    return response.choices[0].message.content

Example 3: Applying System-Wide Tuning from Log Analysis

This demonstrates applying optimizations derived from the analysis of historical application data.

Prompt:

@mcp:netra-apex: Based on traffic from last three weeks from our code-assistant app, just the premium tier customers, what system-level recommendations does Netra have?

Execution Flow:

Get Recommendations: The agent executes the netra:getSystemRecommendations tool, passing the app_id derived from the project context.
Platform Analysis: The MCP server calls the platform's /api/v2/analysis/system_recommendations endpoint. The platform returns optimizations based on historical traffic patterns for that app_id.
Prepare Changes: The recommendation includes a recommended_config_snippet (e.g., --speculative-model <model>) and a target_file_pattern (e.g., **/launch_*.sh).
Locate & Edit File: The agent can interoperate with other tools, using a standard filesystem:findFile tool to locate the target script and filesystem:editFile to apply the configuration snippet.
Report Changes: The agent reports that the recommendation has been applied and presents a diff of the changes for verification.

Appendix: Tool Reference

`netra:getSystemRecommendations`

Description: Fetches system-wide optimization recommendations for an application.
Parameters:
- appId (String, Required): The unique identifier for the application.
Returns: A list of recommendations with metadata for the application.

`netra:applySystemRecommendations`

Description: Applies system-wide optimization recommendations for an application.
Parameters:
- appId (String, Required): The unique identifier for the application.
Returns: A list of recommendations with metadata for the application.

`netra:findModelCalls`

Description: Scans code to identify calls to LLM providers.
Parameters:
- codeContext (String, Required): The source code to analyze.
Returns: A list of identified model calls, including provider, model, and location.

`netra:refactorForMiddleware`

Description: Refactors a function to use the Netra Decision Middleware for dynamic model selection.
Parameters:
- code (String, Required): The source code of the function to refactor.
- intent (String, Required): The optimization goal (e.g., "latency", "cost").
Returns: The refactored code snippet.