Model Context Protocol (MCP) Guide
Introduction
The Netra MCP is designed to integrate with your existing development lifecycle and act as an intelligent partner. Helping you optimize AI-Native systems for cost, latency, quality, throughput, & limiting risks is a continuous challenge.
This document provides a technical overview of the Netra MCP. It focuses on how the MCP provides actionable insights, working alongside your established systems to enhance performance and efficiency.
It is designed to be flexible, supporting both offline analysis for systemic insights and an optional real-time middleware layer for production traffic.
Why It Matters
- In 5 minutes and with a single chat you can integrate Netra into your development flow.
- Chat with your codebase to get actionable optimization insights, backed by the power of Netra's AI-native platform.
- Bring in production insights and multi-user inputs to optimize your system on demand.
MCP Endpoint
https://mcp.netrasystems.ai/v1
Architectural Integration
The Model Context Protocol (MCP) is a secure open standard for AI tool integration. This allows the Netra Agent to interact with your codebase securely without requiring complex, proprietary integrations.
Components
-
MCP Host: Your AI-native IDE (e.g., Cursor, Windsurf) or desktop application (e.g., Claude Desktop). The Host orchestrates communication between your prompts and the available tools.
-
Netra MCP Server: A managed, remote server that exposes the MCP's capabilities as a standard set of tools. It acts as a secure gateway to the core Netra platform.
-
Netra Platform: The backend infrastructure that performs the analysis and generates optimization recommendations.
All interactions are mediated by the MCP Server, ensuring that your development environment never communicates directly with the core platform APIs.
Capabilities Introduction
The Netra MCP's primary function is to execute a "Read-Analyze-Write" loop, turning developer and Agent-to-Agent intent into concrete code and configuration adjustments.
By using your system log data, analyzing your codebase, and multi-user inputs, the agent identifies AI patterns that aren't aligning with your goals. It then identifies Supply Options that can be used to improve the quality of your AI workloads.
Example Prompts:
@mcp:netra-apex: yesterday our costs went up. we need to reduce them but keep quality the same. for the analysis feature you can extend latency by up to 1500ms, but you need to keep the chat latency the same as it is.
@mcp:netra-apex: find the most expensive OpenAI calls and suggest cheaper alternatives with similar quality.
Dynamic Configuration with Middleware
For real-time applications, the agent can refactor code to use the Netra Decision Middleware. This allows for dynamic model selection based on the specific requirements of a request, such as prioritizing latency.
The agent's recommendation can also be used to dynamically set query parameters, allowing for more advanced and context-aware configurations.
See Middleware for more details.
Example: Advanced Configuration with an Orchestration Library
This pattern uses the agent's recommendation to set not only the model but also its runtime parameters.
See Advanced Configuration with an Orchestration Library
System-Wide Tuning
The agent can fetch and apply system-level recommendations based on historical traffic analysis for a specific application, targeting configuration files like launch scripts to apply optimizations.
Getting Started
Add via Chat
Open your IDE's chat window. For example, for Windsurf. Copy and paste this into the Cascade "Write" chat window:
Add to ~/.codeium/windsurf/mcp_config.json:
JSON
{
"mcpServers": {
"netra-apex": {
"serverUrl": "https://mcp.netrasystems.ai/v1",
"auth": {
"type": "bearer",
"token": "${env:NETRA_API_KEY}"
}
}
}
}
To set the environment variable: macOS/Linux: Add export NETRA_API_KEY='your_api_key_here' to your shell profile file (e.g., ~/.zshrc, ~/.bash_profile).
Windows: Use the System Properties dialog to add a new environment variable named NETRA_API_KEY with your API key as the value.
Prerequisites
- An active Netra account and API key.
- A supported MCP-compliant editor (Cursor, Windsurf, etc.).
Configuration
Connection is established by adding a configuration to your IDE's MCP settings file. The Netra MCP server is hosted at https://mcp.netrasystems.ai/v1
.
Example Configuration (~/.cursor/mcp.json
):
{
"mcpServers": {
"netra": {
"serverUrl": "https://mcp.netrasystems.ai/v1",
"auth": {
"type": "bearer",
"token": "${env:NETRA_API_KEY}"
}
}
}
}
It is critical to store your API key in an environment variable (NETRA_API_KEY
) rather than hardcoding it in the configuration file.
Examples
Example 1: Proactive Cost Management
This example demonstrates how the agent identifies cost-saving opportunities directly from the codebase.
Prompt:
@mcp:netra-apex: find the most expensive OpenAI calls in my codebase and suggest cheaper alternatives that have a similar quality score.
Execution Flow:
- Invoke Tool: The IDE's AI Host invokes the Co-Optimization Agent, which executes the
netra:findModelCalls
tool on the current codebase. - Parse Code: The MCP server receives the request and parses the code to identify all calls to LLM providers, such as
openai.chat.completions.create
. It extracts the model used (e.g.,gpt-4-turbo
). - Recommend Supply Options: The server makes an internal API call to the AI Optimization platform's
/api/v1/supply_catalog
. The returned catalog contains metadata for each Supply Options. The AI Host can then present this as an actionable suggestion with a one-click refactoring option. - Refactor Code: Use the
netra:applySystemRecommendations
tool to refactor the code to use the recommended Supply Option.
Example 2: Flex Latency to Achieve Cost Goal
This flow shows how the agent refactors code to maximize a latency tolerant research workload using Netra's middleware.
Prompt:
@mcp:netra-apex: yesterday our costs went up. we need to reduce them but keep quality the same. for the analysis deep-research feature you can extend latency by up to 1500ms, but you need to keep the chat latency the same as it is.
Before:
def get_research_response(prompt):
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
return response.choices.message.content
Execution Flow:
- Intent: The agent is invoked on the function and interprets the user's intent.
- Refactor Tool: The agent triggers the
netra:refactorForMiddleware
tool. - Create Utility Weights: Based on the prompt, it assigns high importance to latency, creating a
utility_weights
object like{"latency": 0.5, "cost": 3, "risk": 1.0}
. - Generate Code: The server generates a new Python code snippet that replaces the hardcoded model call with a dynamic, asynchronous call to the Netra client library (
netra_client.decide
). - Return Refactored Code: The optimized code, which uses the
recommended_solution.supply_id
from the middleware's response to select the fastest model at runtime, is returned to the AI Host.
This example uses the Netra middleware, however, you can also get just the config for this context without middleware. Also this example shows dynamic params, including for
max_tokens
, based on the user defined customer_tier.
After:
# Refactored by Netra Agent to use Netra Middleware
async def get_chat_response(prompt):
# Call Netra to get the best model for this specific request
decision = await netra_client.decide(
timeout=100, # 100 MS timeout to use fallback model
raw_prompt=prompt,
metadata={"app_id": "research-feature"},
utility_weights={"latency": 0.5, "cost": 3, "risk": 1.0}, # Prioritize cost, allow latency to drop.
customer_tier={name: "premium"}
)
# Use the recommended model, or a safe fallback
if decision and decision.get("recommended_solution"):
recommended_model = decision["recommended_solution"][0]["model_family"]["model_id"]
recommended_params = decision["recommended_solution"][0]["API"]["params"]
else:
recommended_model = "default-fast-model"
recommended_params = {"max_tokens": 100}
# Execute the call with the dynamically selected model & params
response = client.chat.completions.create(
model=recommended_model,
messages=[{"role": "user", "content": prompt}],
**recommended_params
)
return response.choices[0].message.content
Example 3: Applying System-Wide Tuning from Log Analysis
This demonstrates applying optimizations derived from the analysis of historical application data.
Prompt:
@mcp:netra-apex: Based on traffic from last three weeks from our code-assistant app, just the premium tier customers, what system-level recommendations does Netra have?
Execution Flow:
- Get Recommendations: The agent executes the
netra:getSystemRecommendations
tool, passing theapp_id
derived from the project context. - Platform Analysis: The MCP server calls the platform's
/api/v2/analysis/system_recommendations
endpoint. The platform returns optimizations based on historical traffic patterns for thatapp_id
. - Prepare Changes: The recommendation includes a
recommended_config_snippet
(e.g.,--speculative-model <model>
) and atarget_file_pattern
(e.g.,**/launch_*.sh
). - Locate & Edit File: The agent can interoperate with other tools, using a standard
filesystem:findFile
tool to locate the target script andfilesystem:editFile
to apply the configuration snippet. - Report Changes: The agent reports that the recommendation has been applied and presents a diff of the changes for verification.
Appendix: Tool Reference
netra:getSystemRecommendations
netra:getSystemRecommendations
- Description: Fetches system-wide optimization recommendations for an application.
- Parameters:
appId
(String, Required): The unique identifier for the application.
- Returns: A list of recommendations with metadata for the application.
netra:applySystemRecommendations
netra:applySystemRecommendations
- Description: Applies system-wide optimization recommendations for an application.
- Parameters:
appId
(String, Required): The unique identifier for the application.
- Returns: A list of recommendations with metadata for the application.
netra:findModelCalls
netra:findModelCalls
- Description: Scans code to identify calls to LLM providers.
- Parameters:
codeContext
(String, Required): The source code to analyze.
- Returns: A list of identified model calls, including provider, model, and location.
netra:refactorForMiddleware
netra:refactorForMiddleware
- Description: Refactors a function to use the Netra Decision Middleware for dynamic model selection.
- Parameters:
code
(String, Required): The source code of the function to refactor.intent
(String, Required): The optimization goal (e.g., "latency", "cost").
- Returns: The refactored code snippet.
Updated 8 days ago